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Abstract 

To understand the effect of assortativc mating on tlie genetic evolution of a population, 
we consider a finite population in which each individual has a type, determined by a se- 
quence of n diallelic loci. We assume that the population evolves according to a Moran 
model with weak assortative mating, strong recombination and low mutation rates. With 
an appropriate rescaling of time, we obtain that the evolution of the genotypic frequencies 
in a large population can be approximated by the evolution of the product of the allelic 
frequencies at each locus, and the vector of the allelic frequencies is approximately gov- 
erned by a diffusion. We present some features of the limiting diffusions (in particular 
their boundary behaviour and conditions under which the allelic frequencies at different 
loci evolve independently) . If mutation rates are strictly positive then the limiting diffusion 
is reversible and, under some assumptions, the critical points of the stationary density can 
be characterised. 
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1 Introduction 

The aim of this paper is to construct and analyse a diffusion approximation for a diallehc 
multilocus reproduction model with assortative mating, recombination and mutation. Our 
starting point is a variant of the Moran model. We suppose that the population is monoecious^, 
haploid^ and of constant size A^. This will be an overlapping generation model, but, in contrast 
to the usual Moran framework, we suppose that reproduction takes place at discrete times 1, 
2, . . . . In each time step, a mating event occurs between two individuals Ii and I2', Ii is 
replaced by an offspring, so that the size of the population is kept constant. The genotype of 
the offspring is obtained from those of Ii and I2 through a process of recombination followed by 
mutation which we make precise in §2. In the classical Moran model, the two individuals Ii and 
I2 are chosen at random from the population. Here, to study the effects of assortative mating, 
we assume that the first individual, /i, is still chosen at random, but the second individual, 
I2, is sampled with a probability that depends on its genotype and on the genotype of the 
first selected individual. The genotype of an individual is composed of a finite number, n, of 
loci with two alleles per locus denoted by and 1. To characterise the assortative mating, we 
introduce a real parameter Sij for every pair of genotypes If Ii has genotype i, then, 

in the draw of I2, an individual with genotype j has a probability proportional to 1 + j^Sij of 
being selected. 

Diffusion approximations for different selection-mutation models have been studied exten- 
sively in the one-locus case (see, for example, Ethier & Kurtz 1986, Chapter 10). The coef- 
ficients 1 + j^Sij of our model play the same role as the (viability) selection coefficients in a 
Wright-Fisher model for a diploid population. Since they depend on the types of both par- 
ents they result in nonlinear (frequency dependent) selection (see §4.4). Ethier & Nagylaki 
(1989) study two-locus Wright-Fisher models for a panmictic'^, monoecious, diploid population 
of constant size under various assumptions on selection and recombination. Depending on the 
strength of the linkage between the two loci, they obtain different types of diffusion approx- 
imation: limiting diffusions for gametic^ frequencies if the recombination fraction multiplied 
by the population size tends to a constant as the size tends to +00 (so-called tight linkage) 

^Every individual has both male and female sexual organs. 

^Each cell has one copy of each chromosome. 

^Every individual is equally likely to mate with every other. 

^Gametes are produced during reproduction. A gamete contains a single copy of each chromosome, composed 
of segments of the two chromosomes in the corresponding parent. Two gametes, one from each parent, fuse to 
produce an offspring. 
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and limiting diffusions for allelic frequencies if the recombination fraction multiplied by the 
population size tends to +00 (so-called loose linkage). To our knowledge, this work has not 
been extended to more general multilocus models with recombination and assortative mating. 
Nevertheless, there is a large body of work on multilocus genetic systems. Most theoretical 
investigations assume that the size of the population is infinite, so that the random genetic drift 
can be ignored; the evolution of genotypic frequencies is then described by recursive equations 
or by differential equations (see Christiansen 2000 and references within). A comparison be- 
tween infinite and finite population models with random mating is presented in Baake &: Herms 
(2008). A review of several simulation studies can be found in the introduction of Devaux & 
Lande (2008). Among these, the 'species formation model', introduced by Higgs & Derrida 
(1992) inspired our work. In their model, mating is only possible between individuals with 
sufficiently similar genotypes, so that from the point of view of reproduction the population 
is split into isolated subgroups. Their simulations display a succession of divisions and extinc- 
tions of subgroups. In this paper we generalise their assortative mating criterion to one defined 
through the family of parameters Sij and provide a general theoretical treatment. 

To give an overview of our results, we first consider a particular pattern of assortative 
mating. Let us assume that the frequency of matings between two individuals of types i 
and j depends only on the number of loci at which their allelic types differ (and not on the 
positions of those loci along the genome). We then have a model with n + 1 assortment 
parameters, denoted by so,...,Sn, obtained by setting Sij = Sk if the genotypes i, j are 
different at exactly k loci (regardless of their positions). This mating criterion will be called 
the Hamming criterion in what follows. A decreasing sequence sq > si > . . . > s„ will 
describe a positive assortative mating (individuals mate preferentially with individuals that 
are similar). An increasing sequence sq < si < . . . < Sn will describe a negative assortative 
mating (individuals mate preferentially with individuals that are dissimilar). 

We establish a weak convergence of the Markov chain describing the genetic evolution of 
the population as its size tends to +00, under a hypothesis on the recombination distribution 
that corresponds to loose linkage (during each reproduction event, recombination between any 
pair of loci occurs with a positive probability) and under the assumption that mutations oc- 
cur independently at each locus with the same rates (at each locus, the rate of mutation of a 
type allele to a type 1 allele is ^ and the rate of mutation of a type 1 to a type is ^). 
In particular, while mutation and assortment parameters are rescaled with population size, 
recombination is not. As a result, we see a separation of timescales. Due to recombination. 
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the genotypic frequencies rapidly converge to a product distribution which is characterised by 
its marginals, that is by the 0- allelic frequencies at each locus. We show that, at a slower 
rate, the allelic frequencies converge to a multidimensional diffusion, whose components are 
coupled only through an infinitesimal drift term (in the mathematical sense) arising from the 
assortative mating. 

Let us describe some features of the limiting diffusion. If si — sq = S2 — si = . . . = s„ — s„_i 
then the frequencies of the 0-allele at each locus evolve according to independent Wright-Fisher 
diffusions with mutation rates /xq and /ii and symmetric balancing selection with strength 
^(■si — So); that is they solve the following stochastic differential equation: 

dxt = y/xt{l - xt)dWt + (^/ii(l - xt) - fiQXt + (si - so)(l/2 - xt)xt{l - xt)^dt, 

where {Wt)t>o is a standard Brownian motion. In all other cases, the allelic frequencies 
at different loci no longer evolve independently. Instead the vector of 0-allelic frequencies 
(xt(l), . . . ,xt(n)) is governed by the stochastic differential equation: 

dxtii) = y^xt{i)il-xtii))dWt{i) 

+ (/ii(l - xt{i)) - fioxtii) + (1/2 - xtii))xt{i)il - xt{j))P,^s{xt))dt, (1.1) 

where (Tyj(l))j>o,. . . , iWt{n))t>o denote n independent standard Brownian motions and P{x^'^^) 
is a symmetric polynomial function of the n — 1 variables x(j)(l — x{j)), j £ {1, . . . ,n} \ {i} 
whose coefficients depend only on the parameters si — sq, . . . , Sn — Sn-i- More precisely, P(x^*)) 
is an increasing function of each parameter si — sq, . . . , Sn — Sn-i (see Theorem 4.1 for an 
explicit formula of P(x^*-*)). When the mutation rates fiQ and /ii are strictly positive, the 
limiting diffusion has a reversible stationary measure, the density of which is explicit. When 
the two mutation rates are equal to ^ > 0, we describe the properties of the critical points 
of the density of the stationary measure. In particular, we find sufficient conditions on and 
si — Sq, . . . , Sn — Sn-1 for the state where the frequencies of the two alleles are equal to 1/2 at 
each locus to be a global maximum and for the stationary measure to have 2" modes. These 
sufficient conditions generalise the independent case. For example, when /_f > 1/2 they imply 
the following results: 

1. if sg — > — (8/i — 4) for every £ G {1,. . . ,n}, then (1/2, . . . , 1/2) is the only mode of 
the stationary measure; 
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2. if Sn — Sn-1 < • • • < Si — So < — (8/U — 4) then the stationary measure has 2" modes. 

These results can be extended to other patterns of assortative mating. In fact, we need only 
make the following assumption on the parameters Sij : the value of the assortment parameter 
Sij between two genotypes i and j is assumed to be the same as the value of sj^i and to 
depend only on the loci at which i and j differ. In particular this implies that the value of 
Si^i is the same for every genotype i. This generalises the Hamming criterion and allows us to 
consider more realistic situations in which the influence on mating choice differs between loci 
(see §2.3). It transpires that, under these assumptions, the limiting diffusion does not depend 
on the whole family of assortment parameters, but only on one coefficient per subgroup of loci 
L. We denote this coefficient mL{s). It is the mean of the assortment parameters for pairs of 
genotypes that carry different alleles on each locus in L and identical alleles on all other loci. 
The stochastic differential equation followed by the limiting diffusion can still be described by 
equation (1.1) if the symmetric polynomial term P(i;^*^) in the drift of the i-th coordinate is 
replaced by a non-symmetric polynomial term Pj(x(*)) in the coefficients of which the quantities 
"^Lu{j}(*) ~ f^Lis) for L C {1, . . . ,n} replace si - sq,. . • , Sn - Sn-i- 

The rest of the paper is organized as follows. In §2, we present our multilocus Moran model. 
In §3, we describe the diffusion approximation for the one-locus model and compare it with a 
diffusion approximation for a population undergoing mutation and 'balancing selection'. We 
recall some well-known properties of this diffusion, in particular the boundary behaviour and 
the form of the stationary measure, for later comparison with the multilocus case. In §4, we 
state our main result concerning convergence to a diffusion approximation in the multilocus 
case (Theorem 4.1) and give two equivalent expressions for the limiting diffusion. We then 
compare with the two- locus diffusion approximation obtained in Ethier & Nagylaki (1989). 
The proof of Theorem 4.1 is postponed until §7. In §5, we derive some general properties of 
the limiting diffusion. §6 is devoted to the study of the density of the stationary measure. An 
appendix collects some technical results used in the description of the limiting diffusion. 

2 The discrete model 

This section is devoted to a detailed presentation of the individual based model. The as- 
sumptions on assortative mating, recombination and mutation that we will require to establish 
a diffusion approximation for the allelic frequencies are discussed at the end of the section. 
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2.1 Description of the model 

We consider a monoecious and haploid population of size N where the type of each in- 
dividual is described by a sequence of n diallelic loci. For the sake of brevity, let the set 
of loci be identified with the set of integers [l ; 7^| := and let the two alleles at 

each locus be labelled and 1. The type of an individual is then identified by an n-tuple 
k := {ki,...,kn) G {0,1}". Let A = {0,1}" be the set of possible types. The proportion 
of individuals of type k at time t G IM will be denoted by zj:^\k) so that the composition 
of the population is described by the set zj:^^ = {zj:^\k), k G A}. At each unit of time 
the population evolves under the effect of assortative mating, recombination and mutation as 
follows. 

Assortative mating: at each time t, two individuals are sampled from the population in 
such a way that: 

1. the first individual has probability z[^\i) of being of type i; 

2. given that the first individual chosen is of type i, the probability that the second individual 
is of type j is 



The population at time t+1 is obtained by replacing the first chosen individual with an offspring 
whose type is the result of the following process of recombination followed by mutation. 

Recombination: for each subset L of [1 ; n], let denote the probability that the offspring 
inherits the genes of the first chosen parent at loci £ £ L and the genes of the second parent 
at loci i ^ L. The family of parameters {r^, L C [[l;n]]} defines a probability distribution, 
called the recombination distribution, on the power set V{ll ; n|) (it was first introduced in this 
manner by Geiringer (1944) to describe the recombination-segregation of gametes in a diploid 
population). It is natural to assume that the two parents contribute symmetrically to the 
offspring genotype, that is: 

Assumption HI: for each subset L of |1 ;n]], r^ = r^ where L denotes the complementary 
set of loci, [1 ; n] \ L. 

^We are allowing a small chance of self-fertilisation. 




where the assortment parameters {Sij , 



i,j G .4} are fixed nonnegative real numbers^. 
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With this notation, the probabihty that, before mutation, the offspring of a pair of individ- 
uals of types (i, jf) is of type k is 

qi{i,j);k)= I{fc=(i|i ■ 

LcUM 

Let us express some classical examples of recombination distributions in this notation: 
Examples 2.1. 

1. r0 = rp.^i = ^ (no recombination, also called absolute linkage) 

2. rj = 2~" for each / G "Pdl ;?^]) (free recombination) 

3- r^i-x} = rix+i;n} = 2(n-i) for 1 < X < 71 - 1 and r0 = r^.^j = 1/2(1 - r) where r denotes 
an element of ]0, 1] (at most one exchange between the sequence of loci which occurs with 
equal probability at each position). 

Finally we superpose mutation. 

Mutation: we assume that mutations occur independently and at the same rate at each 
locus: will denote the probability that an allele 1 at a given locus of the offspring changes 
into allele and /Ug^^ the probability of the reverse mutation. The probability that the mutation 
process changes a type k into a type £ is: 

n 

/.(^)(fc,£) :=n(/^r)''^-'''(i-Mifv-i^-'=< 
1=1 

2.2 Expression for the transition probabilities 

It is now elementary to write down an expression for the transition probabilities of our 
model. In the notation above, if z = {z{k),k £ A} describes the proportion of individuals of 
each type in the population at a given time, then the probability that, in the next time step, 
the number of individuals of type j increases by one and the number of individuals of type 
i ^ j decreases by one is 

Mz,i,j):= Yl zii)z{k)w^''\z,i,k)qi{i,ky,i)f,(''\i,j) 
k,£eA 

where 
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2.3 Assumptions on assortative mating, recombination and mutation 

In order to obtain a diffusion approximation for a large population, we assume that mutation 
and assortment parameters are both 0{N~^), so we set 
Assumption H2: fii^^ = ^ for e G {0, 1} and s\^^ = ^ for i,j G A. 

Just as in the two-locus case studied by Ethier & Nagylaki (1989), we can expect diffusion 
approximations to exist under two quite different assumptions on recombination, correspond- 
ing to tight and loose linkage. Here we focus on loose linkage. More precisely, we assume 
that the recombination distribution does not depend on the size of the population and that 
recombination can occur between any pair of loci: 

Assumption H3: For every I £ Vdl'jnJ), rj does not depend on N and for any distinct 
integers h,k £ [1 ; n-J, there exists a subset I G "^([1 ; ^^]) such that h £ I, k ^ I and rj > 0. 

This assumption is satisfied for the last two examples of recombination distribution pre- 
sented in Example 2.1, but not in the absolute linkage case. In infinite population size multilo- 
cus models with random mating, and without selection, this condition is known to ensure that 
in time the genotype frequencies will converge to linkage equilibrium, where they are products 
of their respective marginal allelic frequencies (see Geiringer 1944 and Nagylaki 1993 for a 
study of the evolution of multilocus linkage disequilibria under weak selection) . 

In order that the generator of the limiting diffusion has a tractable form, we shall make 
two further assumptions on the family of assortment coefficients s = {sij, £ A"^}: 
Assumption H4: for every £ A?, 

2. the value of Si j depends only on the set of loci k at which ik = and jk = 1 and on the 
set of loci I at which ii = 1 and ji = 0. 

These conditions mean that the probability of mating between two individuals at a fixed 
time depends only on the difference between their types. In particular, two individuals of 
the same type will have a probability of mating that does not depend on their common type: 

= Sj j for every i,j £ A. In the one-locus case, this assumption means that the model 
distinguishes only two classes of pairs of individuals since so,i = ■si,o and so,o = si,i- 
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In the two-locus case, this assumption leads to a model with five assortment parameters: 

• one parameter, soo,oo = -sii,!! = sio,io = -soijOi) for pairs of individuals having the same 
genotype, 

• one parameter sqo.io = siofio = -sii.oi = -soi,!! for pairs of individuals whose genotypes 
only differ on the first locus, 

• one parameter, soo,oi = ■5oi,oo = ^n^io = sio,ii, for pairs of individuals whose genotypes 
only differ on the second locus, 

• two parameters soi,io = sicoi and ^oo,!! = sn^oo for pairs of individuals whose genotypes 
differ on the two loci. 

To describe positive or negative assortative mating we have to choose how to quantify 
similarities between two types. Let us present two criteria that provide assortment parameters 
for which assumption H4 is fulfilled: 

1. Hamming Criterion. One simple measure to quantify similarities between two types is 
the number of loci with distinct alleles: Sij will be defined as nonnegative reals depending 

n 

only on the Hamming distance between i and j denoted by dh{i,j) := \ii — A 

1=1 

positive assortative mating will be described by a sequence of n + 1 nonnegative reals 
sq > si > . . . > Sn hj setting Sij = s^^i^ij^ for every i,j G A. This criterion will be 
called Hamming criterion. 

2. Additive Criterion. If we assume that the assortment is based on a phenotypic trait 
which is determined by the n genes whose effects are similar and additive, then a conve- 
nient measure of the difference between individuals of type i and j is 

n 

da{i,j) ■= I — je)\- A positive assortative mating will be described by a sequence 

£=1 

of n + 1 nonnegative reals sq > si > . . . > Sn hy setting Sij = Sj_^^ij^ for every i,j S A. 
This criterion will be called additive criterion. 

The assortative mating in the species formation model of Higgs & Derrida (1992) is a 
special case of the Hamming criterion. The additive criterion is widely used in models in 
which assortative mating is determined by an additive genetic trait. For example, Devaux 
& Lande (2008) use it to investigate speciation in flowering plants due to assortative mating 
determined by flowering time. Flowers can only be pollinated by other flowers that are open 
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at the same time. Modelling flowering time as an additive trait, they observe an eff'ect that 
is qualitatively similar to that observed in the simulations of Higgs & Derrida (1992) for the 
Hamming criterion, namely continuous creation of reproductively isolated subgroups. 

With the Hamming and additive criteria, every locus is assumed to have an identical posi- 
tive or negative influence on the assortment. As we have defined a general family of assortment 
parameters, it is possible to consider more complex situations. For instance, we can take into 
account that some loci have a greater influence on the mating choice than others by dividing the 
set of loci into two disjoint subgroups Gi and G2] we introduce two sets of assortment parame- 
ters s*^^-* and s^^) that satisfy assumption H4 for the subgroups of loci Gi and G2 respectively. If 
we assume that the effects of the two subgroups are additive we set Si n = s'f^ ■ + sf^ ■ 
for every i,j E A. This deflnes a set of assortment parameters that satisfles assumption H4. 
More generally, any set of assortment parameters defined as a function of s^^^ and s^^^ satisfies 
assumption H4. 

3 The one-locus diffusion approximation 

Before studying the multilocus case, for later comparison, in this section we record some 
properties of the one-locus model. 

3.1 The generator of the one-locus diffusion 

In the case of one locus (n = 1), under assumption H2, the frequency of 0-alleles satisfies: 



IE,[zf )(0) - = ^((1 - - zfio 

+ ^z{l - z)((si,o - -z)- (so,i - so,o)^)) + 0(l/iV3) 

IE,[(zf ) (0) - zf] = -^z{l -z) + 0{1/N') 
JE,[{zf\0)-z)^] = O{l/N^) 



uniformly in z. 

Therefore the distribution of the frequency of 0-alleles at time [N'^t] is approximately governed. 
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when N is large, by a diffusion with generator: 
1 / 

+ l/2x(l - x)((si,o - -x)- (so,i - so,o)a;)) (3.1) 

More precisely, if Z^^"* converges in distribution in [0, 1] as N tends to +00, then {z\j^l^^t>o 
converges in distribution in the Skorohod space of cadlag functions Dpj] ([0, +00)) to a diffu- 
sion with generator Qi^s (see, for example, Ethier & Kurtz 1986, Chapter 10). 



If we assume that s satisfies assumption H4, that is so,o = and so,i = si,0) and if we 
denote their common values by sq and s\ respectively, then the drift has a simpler form and 
we obtain 

1 ^ (f . ^ . .\ d 



h,s = -^xil - x)-^ + {{I - x)/ii - x^Q + (si - so)(l/2 - x)x{l - x)^ 



dx 



Remark 3.1. This diffusion can also be obtained as an approximation of a diploid model with 
random mating, mutation and weak selection in favour of homozygosity^ (when sq — si > 0) or 
in favour of heterozygosity (when sq — si < 0) (see, for example, Ethier & Kurtz 1986, Chapter 
10). 



3.2 Properties of the one-locus diffusion 

Stationary measure. If //q and ^1 are strictly positive, this diffusion has a reversible sta- 
tionary measure. Its density with respect to Lebesgue measure on [0, 1] is given by Wright's 
formula: 

Qi^Ax) = C^,,sx^^'^'\l - x)2^«-^ exp ( - l/2((si,o - si,i)(l - xf + (so,i - sco)^')) 

where the constant C^.s is chosen so that / g^^s{x)dx = 1. This is plotted, for various param- 
eter values, in Fig. 1 under the assumptions fii = fiQ = /i, so,o = si^i = sq and so,i = 51,0 = si- 



Boundary behaviour. According to the Feller boundary classification for one-dimensional 
diffusions (see e.g. Ethier & Kurtz 1986): 

®A diploid individual is homozygous at a gene locus when its cells contain two identical alleles at the locus. 
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Figure 1: Representation of the invariant density g^^g for the one- locus diffusion when the two 
mutations rates fii = = fi, so,o = = so and so,i = si,o = ■51- In the figure on the left, 
fi > 1/2 and matings between individuals of the same allelic type are favoured. The density is 
bimodal if and only if sq — si > 8fi — 4. In the figure on the right, < /u < 1/2 and matings 
between individuals of different allelic types are favoured. The density tends to +oo at the 
boundaries and has a global minimum at 1 /2 if and only if si — sq < 4 — 8//. 

(i) if /ii = then is an absorbing state and the diffusion exits from ]0, 1[ in a finite time 
almost surely; 

(ii) if /ii > 1/2 then is an entrance boundary (started from a point in ]0, 1[ the diffusion 
will not reach in finite time, but the process started from is well-defined); 

(iii) if < ^1 < 1/2 then is a regular boundary (starting from a point zq €]0, 1[ the diffusion 
has a positive probability of reaching before any point b &]zq, 1] in a finite time and the 
diffusion started from is well-defined); 

with the obvious symmetric definitions at 1. 



4 Convergence to a diffusion in the multilocus case 



In the case of several loci, under assumptions H2 and H3, a Taylor expansion shows that 
the drift '\E[zj.!^^{i) — zj:^\i) \ zf^^ = z\ is of order only inside the set of product dis- 
tributions on {0, 1}". This set is often called the Wright manifold or the linkage- equilibrium 
manifold and denoted by Wn (a population is said to be in linkage equilibrium if the geno- 
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type distribution z is in W„,, that is if z(i) = z\(i\) ■ ■ ■ Zn{}n) Vi = G A where 

Zj{x) = J2ke{o ^{ki, . . . , kj-i,x, kj+i, . . . , /c„) denotes the frequency of individuals having 
the allele x at the j-th locus). 

Outside this manifold, the drift pushes the process towards the Wright manifold at an exponen- 
tial speed. Therefore to extend the diffusion approximation to the n-locus case, we introduce 
a change of variables composed of the n 0-allelic frequencies and of 2" — n — 1 processes that 
measure the deviation from the linkage equilibrium. 
For a nonempty subset L of { 1 , . . . , n} , 

. letxf)(L) = V- eA, jiL=o ^t^\j) denote the proportion of individuals having the allele 
on all loci in L at time t] 

• let Y^^\l) = WifzL — xI^\l) for \L\ > 2 describe the linkage disequilibrium 

between the loci in L at time t. (This is just one of many ways to measure the linkage 
disequilibrium, see for example Biirger (2000), Chapter V.4.2, for other measures.) 

The vector of 0-allelic frequencies at time t is xj:^'^ := (^X^^\{1}), xj:^\{n})^ . 

The process y(^) defined by y}^^ := {Y^^^\L), L C [1 ; such that \L\ > 2} for t > 

vanishes on the Wright manifold. 

We shall show that if tAr tends to -|-oo faster than N then Y,i^} converges to while if time is 
sped up by iV^ then X(^) converges to a diffusion as tends to -|-oo. 

Before giving a precise statement of the convergence result for the two processes X^^^ and 
y(^) (Theorem 4.1), let us introduce some notation in which to express the parameters of the 
limiting diffusion. 

4.1 Mean assortment parameters 

For a subset L of loci, consider the set of pairs of genotypes that differ at each locus i £ L 
and are equal at each locus £ ^ L: 

Fl = & A."^ ■ iu = 'i^ - ju'^u £ L and z„ = ju Vii G 1} . 

Let rriL^s) denote the mean value of the assortment parameters for all pairs in this set F^: 

mL{s) = 2"" ^ Si^j. 
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Examples 4.1. 



1. In the two- locus case. 



m0(s) 



^(^00,00 + SOl.Ol + •SlO,10 + ■511, ll) 

I, . 

tI^OO.IO + 510,00 + Soijll + Sll.Olj 



m{i}{s) 



m|2}(s) 



7(^00,01 + S01,00 + 511,10 + 5lO,ll)- 



In each of these expressions the four coefficients are equal by assumption H4. 



"T'{l,2}(s) = t(sOO,11 + Sll,00 + S01,10 + SlO,Ol)- 



In this expression the first (resp. last) two coefficients are equal by H4. 

2. With the Hamming criterion, 771^(3) = for every L C [l;ra]], where |L| denotes the 
number of loci in L. 

3. With the additive criterion, m0(s) = sq, m^£j{s) = si \/i £ [[l;ra| and more generally 



as the population size N tends to +00. The proof, based on Theorem 3.3 of Ethier &: Nagylaki 
(1980), is postponed to §7. 

Theorem 4.1. Assume that hypotheses HI, H2, H3 and H4 hold. 

(a) For i £ |1 ;n] let Pi^s{x) denote a polynomial function in the n — 1 variables Xfc(l — x^) 
for k G [[1 ;n] \ {i}. Then, the operator 



2k-\L\\ for every L C [1 ;n]. 



4.2 Convergence to a diffusion 



The following theorem provides convergence results for the two processes Xi^) and y(^) 





with domain 'D{Qn,s) = ^^^([0) 1]") is closable in C([0, 1]") and its closure is the generator 
of a strongly continuous semigroup of contractions. 
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(b) If Xq converges in distribution in [0,1]", then {X^j^2£^)t converges in distribution in the 
Skorohod space of cddldg functions D[o^i]n ([0, oo)) to a diffusion process X with generator 
Qn,s where the polynomial function Pj^s(x) has the following expression: 

PiA^) = ^ {iriAuiijis) - mA{s)) 
Ae-P([l;n]\{i}) 

J](2xfc(l-Xfc)) J] (l-2x,(l-x,)). (4.2) 

(c) For every positive sequence {tj\f)N that converges to +oo, ^[^^j converges in distribution 
to 0. 

Remark 4.1. 

1. The recombination distribution (?^/)/c[i;n] does not appear in the expression for the lim- 
iting diffusion. Nevertheless, the proof of Theorem 4.1 will show that it has an influence 
on the speed of convergence of the linkage disequilibrium to 0. 

2. The limiting diffusion depends on the assortment parameters only via the quantities 
niAis) for every Ac [1 ; nj. A set of assortment parameters for which 

mAu{i}{s) — rnA{s) < for every i S [1 ; nj and Ac |1 ; n] \ {i} 

favours homozygous mating with respect to the genotype at the i-th locus. It is therefore 
no surprise that by increasing the value of rnAu{i}is) — ruAis) for a fixed subset A, we 
increase the value of the i-th coordinate of the drift at a point x for which Xj < 1/2 and 
decrease it at a point x for which Xj > 1/2. 

4.3 Another expression for the polynomial term Pi^s{x) of the drift 

An expansion of the polynomial function PiA^) in terms of the variables Xfc(l — x^), 
k C [1 ; n] \ {i} yields the following expression: 

PiAx)= Yl a,,L(s) JJx^(l-x^) (4.3) 

Ler(ll;nl\{i}) 

with 

a,,As) = 2l^l 5] (-l)l^l-l^l("^^uw(s) - mAis)). (4.4) 

AcL 

The details of the proof are provided in §7.2. 
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The coefficients aj,L(s) can be compactly expressed using difference operators. Let us 
introduce some notation: for a function / defined on the subsets of a finite set E and for an 
element i of E, we denote by 6i[f] the function on V{E) defined by 

6Afm = f{Au{i})-f{A), yA€V{E). 

Since 6i o 5j = 5j o 5i for every i,j £ E, we can, more generally, introduce a difference operator 
6b for each subset B £ V{E) by setting = Id, and 6b = Sbi ° ' " ° ^br if ^ = {^i, • • • , W}. A 
proof by induction on provides the following formula for 6b- 

6B[f]{A)=^i-l)\^\~\''\fiAuJ) yAcE. (4.5) 

JCB 

Let m(s) denote the function A i— t- niA^s) defined on the subsets of [1 ; raj. In this notation, for 
every A C [l;n] 

mAu{i}{s) - mA{s) = 6i[m{s)]{A) and Oi^s) = 2\^^6Au{i}[m{sW)- (4-6) 

If, for each subset A of loci, the coefficient 7ti^(s) depends only on the number of loci in 
A, then it follows from expression (4.3) that Pi^s{x) is a symmetric polynomial function, the 
coefficients of which do not depend on i. This is the case for instance with the Hamming and 
additive criteria (see Example 4.1 for the corresponding expressions for ?ti^(s)). Let us give 
the expanded form of Pi^s{x) for the Hamming criterion: 

n-l 

PUx) = Y,^^ E \{xt{l-x,). (4.7) 

£=0 Lc[l;n.I\{i}, \L\=t l&L 

where dfc(s) = 2^ I]Lo(~l)^(^)(*fc-£+i - Sk-i). 

As in the general case, the coefficient afc(s) has a compact expression in terms of difference 
operators. Let 6^^^ denote the forward difference operators: (5^^^[s](i) = Sj+i — Si for ev- 
ery i G [0 ; n — 1| . The forward difference operators of higher orders are defined iteratively: 
^ 5(fc)o5(i)[s] for A; G M*. With this notation, Qfc(s) = 2^ 6'^^+^\s\{d) for A; G [[0 ; n - 1]]. 

4.4 Comparison with the two-locus Wright-Fisher diffusion 

Ethier & Nagylaki (1989) established convergence results for a general multiallelic two- 
locus Wright-Fisher model of a panmictic, monoecious, diploid population of N individuals 
(identified with 2N haploids) undergoing mutation and selection. In their model, a gamete is 
described by a pair i = (11,12) G [1 ; x [1 ; '"21 where ri is the number of alleles in the first 
locus and r2 is the number of alleles in the second locus. The parameters of their model are: 
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1. the viability of a pair of gametes {i, j) denoted by wj\[^ij = 1 — (TN,i,j with the assumption 

'^N,i,j = '^N,j,i and aN,i,i = for every i,j £ ll;ri} x [I ; r2]] (after viability selection 

the proportion of a pair of gametes is assumed to be P^^ = — — — — ^ if 

l^k,i WN,k,e.PkPi 

denotes the frequency of gametes k in the population Vfc G |1 ;ri]] x \1 ;r2]]); 

2. the recombination fraction cat; 

3. the probability (2A^)~^z/j*]. that the j-th allele in the i-th locus mutates to the /c-th allele. 

The population at the generation t -|- 1 is obtained by choosing 2N gametes uniformly at 
random with replacement from the pool of gametes of the generation t after the steps of 
viability selection, recombination and mutation. 

They studied the diffusion approximation under several assumptions on selection and re- 
combination coefficients. In the case of weak selection [2Najsi,i,j converges to a real number 
denoted by Uij for every and loose linkage (cat converges to a finite limit and Ncn tends 
to -|-oo) they obtained a limiting diffusion for the allelic frequencies (pi, . . . ,Pr-i-i; 91, • • • , 9r2-i) 
of the alleles 1, . . . , ri — 1 in the first locus and the alleles 1, . . . , r2 — 1 in the second locus. In 
the case of two alleles at each locus (ri = r2 = 2), the generator of the limiting diffusion is 

with 

bi{pi,qi) =i^2a(l -Pi) - I'llpi 

-Pl{l- pi){l - 2pi) (^(fTi2,21 + 0-ii,22)gi(l - qi) + 0-11,210'? + 0-12,22(1 " qif 

- 2pi(l -pi)qi{l - gi)(^crii,i2Pi - 0-21,22(1 -Pl))- 

^2(pi, gi) =^^2^(1 -11)- 

- qi{l - qi){l - 2^1 ) (^(0-12,21 + 0-11,22)^1(1 - Pi) + 0-11, 12P? + 0-21,22(1 -Pif^ 

- 2gi(l - qi)pi{l - Pi) ( 0-11,2191 - 0-12,22(1 - qi 



Accordingly, the generator C coincides with t/2,s if we assume 

(a) that the mutation rates z^j*]. do not depend on the locus i and set = fiQ and z^2*i = ^i; 

(b) that the coefficients of selection satisfy ctii,2i = 0-12,22 and 0-11,12 = 0-21,22 (second condi- 
tion of assumption H4) 
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and set (Ti j = — ^Si-i,j-i, for every i, j G {1, 2}^ (with the notation 1 = (1, . . . , 1)). 
This comparison suggests that the effect of assortative mating on the genotype evolution of a 
large population in our model is similar to the effect of weak viability selection in a diploid 
Wright-Fisher model with mutation. 



5 Description of the limiting diffusion 



This section collects some properties that can be deduced from the form of the generator, 
Qn.si of the limiting diffusion. 

5.1 The set of generators arising from the model 
Lemma 5.1. Any generator on C^([0, 1]") of the form 
1 " 92 

^(^{l-Xi)lJ.i-Xillo + {l/2-Xi)Xi{l-Xi) ^ O^LyJ^^^W^Xk{l- Xk)^^, 

i=l Le-P([l;nl\{i}) fcgL 



where {cka-, ^ C [I A ^ ^} is a family of real numbers, can he interpreted as the generator 
of the diffusion approximation of an n-locus Moran model as defined in 

Proof. We may, for instance, take the following set of assortment parameters {sij, i,j G A}: 
• Si i = for every i £ A. 



YliBdL \B\>i 2 '^'^""^Ofi for every (i, j) G and for every nonempty subset L of 
[l;nl. 



Let us check that this family satisfies 2^^^ ^5l[W'(s)](0) = ul for every nonempty subset L of 
\l-nl First, mL{s) = EbcL, \b\>i For every i G [1 ; and L C |1 ; n] \ {i} 

AcL AcL BcA 

We invert the double sum and use the formula (_iy^|-|^l = to obtain: 

AcL, s. t. BcA 

BCL 

□ 
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In particular, the n-locus Moran model with assortative mating based on the Hamming 
criterion allows us to obtain, through diffusion approximation, any generator on C2([0, 1]") of 
the form: 



1 " 

- - Xi)- 



92 



2 ^ 9.x..9;^, 

_d_ 



1=1 

n n—l 



^(^{1 - Xi)m - XiHo + {1/2 - Xi)xi{l - Xi)^ai ^ J|xfc(l-3;fc) 

i=l 1=0 Lc[l;n]\{i} fcGL 

s.t. \L\=l 



To see this, given any sequence ao, . . . , a„_i of n reals, we have to find n + 1 real numbers 
sq, . . . ,Sn such that ai = 2^ 6^^^^^ [s]{0) . These are given by the inversion formula (A. 3) in the 
Appendix, from which we see that we may set sq = and Sk = Yle=i ^^"^l^)'^^-! ^ ^ [1 ; '^1- 

5.2 The generator for two groups of loci 

Let us consider a partition of the set of loci into two subgroups, Gi = Jl ; kj and 
^2 = P + 1 ;^^], say. We introduce two sets of assortment parameters s^^^ and s*^^) depend- 
ing on subgroups of loci from Gi and from G2 respectively and satisfying assumption H4. 
If we assume that the assortment parameter between two individuals of type i and j is 

= 7,^ 9,^ every i,j G A, then 771^(5) = minGi(s^^^) + "linGzls^^^) for 

every subset L of This implies that the first k coordinates of diffusion limit evolve 

independently of the last n — k coordinates and that the generator of the diffusion limit is: 



Qn,s = Gk,si ® Qn-k. 



32- 



Therefore, with these choices we can reduce our study to subgroups of loci having the same 
influence on assortment. 

5.3 Conditions for independent coordinates 

For some patterns of assortment, the allelic frequencies at each locus in a large population 
evolve approximately as independent diffusions: 

Proposition 5.1. Assume that the assortment parameters s = {sjj, i,j G A} satisfy the 
assumption H4. 

1. The n coordinates of the diffusion associated with the generator Qn,s o-fs independent if 
and only if the following condition holds: 
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2. If condition (H5) holds, the i-th coordinate behaves as the one-locus diffusion with as- 



3. In particular, 

(a) with the Hamming criterion, Qn.s is the generator of n independent one- dimensional 
diffusions if and only if the value of s^+i — si does not depend on i; 

(h) with the additive criterion, Qn,s is the generator of n independent one- dimensional 
diffusions if and only if there exists a constant c such that S£+i — sg = c{2i + 1) for 
every £ £ [0 ; n — 1] . 

Proof. First note that On.,s is the generator of n independent diffusions if and only if the 
polynomial term Pi^s(x) is a constant function for every i G [1 ; ?^]]. 

1. According to the formula (4.2), the polynomial term Pi^s{x) is a constant function for 
every i £ [1 ; ?^] whenever condition H5 holds. Conversely, assume that the polynomial 
term is a constant function for every i G By formulae (4.3) and (4.6), 

(5i[m(s)](0) = for every subset L of [1 ; nj having at least two elements. We derive, from 
the inversion formula (A. 2) stated in the Appendix, that for every subset A £ Vdl ; nj), 




S,[m{s)]{A) = ^Bu{i}[m{sW) = 6,[m{s)m. 



BcA 



Therefore, condition H5 is satisfied. 



2. With the Hamming criterion, mA{s) = s^j^^ and condition H5 is equivalent to 



Sfc+i - Sk = si - So for every k £ [[1 ; n - Ij. 



3. With the additive criterion, for a subset L with £ elements miis) = 2 ^Yl^j=Q i^j)s\2j-i\- 



5.4 Behaviour at the boundaries 
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After some computation, we obtain for i ^ 

■mLvj{{\{s) - rriLis) = 2 ^ ( ■ ) ('^|2i-^+i| " 

t+i 

2-'E (l±f-,)^^'^H(2j - 2) if ns odd, 

= < ' (5.1) 

2-^(E (|y'5('H5](2j - 1) + ipS^^Hsm) if ^ is even. 

It follows from (5.1) that for every c G IR, the system defined by 

"^Lu{i}(^) ~ ^Lis) = c for every i G [1 ; n] and L C [[1 ; nj \ {i} 
has a unique solution which is (5'-^-' [s] (/c) = c{2k + 1) for every k G [[0 ; n — 1]. 

□ 

5.4 Behaviour at the boundaries 

In this section the trajectories of the coordinates of the limiting diffusion are compared 
with those of one-dimensional diffusions in order to investigate whether an allele can be (in- 
stantaneously) fixed at one of the loci. 

Consider the stochastic differential equations associated with the generator Gn,s- 

dxt{i) = y/xt{i){l-Xt{i))dWt{i) + bi{xt)dt Vi G [1 ;n], (5.2) 

where {Wt{l))t>o,- ■ ■ , iWt{n))t>o denote n independent standard Brownian motions, and 

bi{x) = /Ui(l - x{i)) - fiox{i) + (1/2 - x{i))x{i){l - x(z))Pi,^(x) for i£ll;nj. 

Theorem 1 of Yamada & Watanabe (1971) ensures pathwise uniqueness for the stochastic 
differential equation (5.2), since the drift is Lipschitz and the diffusion matrix is a diagonal 
matrix of the form 

a{x) = diag(o-i(x(l)), . . . ,cr„(x(n))), 

where the functions cij are 1/2-Holder continuous functions. 

The following proposition shows that, just as for the one-locus case, the boundary behaviour 
of the solution to (5.2) depends only on the values of the mutation rates and /ii. 
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Proposition 5.2. Let {xt)t>o denote a solution of the stochastic differential equation (5.2) 
starting from a point xq g]0, 

(i) If fii = Ho = then the diffusion process {xt)t exits from ]0, 1[" in a finite time almost 
surely. 

(a) If Hi = and Ho > then each coordinate of {xt)t reaches the point in a finite time 
almost surely. 

(Hi) //O < /xi < 1/2 then is attainable for each coordinate of the diffusion process: 

IP[3i > 0, xt(z) = 0] > ViGp;n]. 

(iv) If Hi ^ 1/2 then is inaccessible for each coordinate of the diffusion process: 

W{3t > 0, xt{i) = 0] = and P[ lim xt{i) = 0] = for every i e [1 ;n]. 

t— > + CJO 

Similar statements to (ii), (Hi) and (iv) hold for the point 1 on exchanging the roles of Hi o-i^d 

Proof. Let i G II ; f^]]- On [0, 1]" the polynomial function Pi^s is bounded above by 
M+= 2-1^1 max {m^u«(s) -"^a(s),0} 

^C[l;nl\{j} 

and is bounded below by 

M- = - Yl 2-'^lmax{ -(mAuw(s)-mA(s)),0}. 

^C[l;nl\{j} 

Let bf and b~ denote the functions defined on [0, 1] by 

b+{u) = /ii(l-tx)-^ou + (l/2-n)^z(l-^z)(M+I|„<i/2}+M-I{„>i/2}), 
b-{u) = /ii(l-tx)-;Uon + (l/2-n)u(l-u)(M+I|„>i/2}+Af-I{„<i/2}), 

for every u G [0,1]. For every i G pathwise uniqueness holds for the following two 

stochastic differential equations: 

dut = y/ut{l-ut)dWt{i) + btiut)dt (5.3) 

and 

dut = Vut(l - ut)dWt{i) + br{ut)dt. (5.4) 
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Let ^^(i) and be the solution starting from xo{i) of the stochastic differential equations 

(5.3) and (5.4) respectively. As the i-th. coordinate of the drift is bounded above by bf and is 
bounded below by b~ , the comparison theorem of Ikeda & Watanabe (1977) ensures that the 
following inequalities hold with probability one: 

{i)<xt (i) < (0 Vi > 0, Vi G [1 ; n] . (5.5) 

The nature of the points and 1 as described by the Feller classification is the same for {£,^{i))t 
and {£,^{i))t and depends only on fii and fiQ. To describe their behaviours near 0, let Tt'^{a, h) 
denote the first time the process {i^{i))t, starting from z, exits (a, h) for Q<a<z<h<\. 

1. If /ii = /io = then and 1 are absorbing points; {^t{i))t reaches or 1 in a finite time 
with probability one and 



IP 



lim it{i)=Q 



t^T±''{0,l) 



L^exp(-/f,2^^du)(ix 



2. If /ii = and /Uq > then is the only absorbing point and {£,f{i))t reaches in a finite 
time with probability one. 

3. If < < 1/2 and /io > then is attainable: for every < z < 6 < 1, 



P 



T2=^'*(0,6) < +00 and lim ^f{i) 



> 0. 



4. If /ii > 1/2 and f.tQ > then is inaccessible: for every < z < 1, 

P^[3t > 0, ^ti^) = 0] = and [ lim = O] = 0. 



- t~>-+oo 

Similar properties hold for the behaviour near the point 1. 

These properties are sufficient to prove the boundary behaviour claimed for {xt)t- Since 
xt{i) < ^t"(0 every t > 0, if {Ct{''-))t reaches in a finite time then so must {xt{i))t- 
Similarly, if is attainable for {^t'{i))t then is also attainable for {xt{i))t- In the same way, 
since xt{i) > ^^{i) for every t > 0, if is inaccessible for {Ct {^))t then is also inaccessible for 
{xt{i))t- 

It remains to prove that {xt)t exits from ]0, 1[ in a finite time with probability one if 
A^i = /^o = 0. Let e > be small enough that xq € [e, 1 — e]"- The diffusion xt exits 
from the compact [e, 1 — e]" in a finite time with probability one. Let x^ be a point on the 
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boundary of [e, 1 — e]". There exists i € such that x^{i) G {e, 1 — e}. For z €]0, 1[, 

set (j)f{z) := Pz[hm^^^±,i(o — 0]- the comparison theorem apphed to the solutions 

of the stochastic differential equations (5.2), (5.3) and (5.4) starting from x^, the probability 
that the solution of (5.2) starting at x^ reaches the boundary of [0, 1]" in a finite time is 
greater than (t>f{() if = e and is greater than 1 — — e) if x^ = 1 — e. By the strong 
Markov property, the probability that [xtj reaches the boundary in a finite time is greater 
than min{min((/>^(e), 1 — (/'^"(l — e)),* £ Pi'^l} for every e > 0. Therefore (xj)t reaches the 
boundary in a finite time with probability one. □ 

6 The stationary measure of the hmiting diffusion 

6.1 Existence of a stationary distribution and an expression for its density 

As in the one-locus case, when the mutation rates are strictly positive, the diffusion has a 
reversible stationary distribution: 

Proposition 6.1. Assume that the hypothesis H4 holds and that the mutation rates /xq a^c? ni 
are strictly positive. Set Sij = Si j — si^i for every pair of types i,j G A. The diffusion with 
generator Qn,s has a unique reversible stationary distribution which has the following density 
with respect to the Lebesgue measure on [0, l]*^.' 

n 
i=l 

where 

• Hn,s{x) = l Yl mL{s)'[[i2xe{l-xe)) J] (1 - 2x^(1 - x^)); 

Lc[l;n], |L|>1 ieL fce[[l;n]\L 

• Cn,fi,s is chosen so that / gn,fi,s{xi, . . . , Xn)dxi ■ ■ ■ dxn = 1- 

J[0,1]" 

Remark 6.1. An expansion of the polynomial function i/ri,s yields: 

HnAx) = Yl 2\^^-'SLlm{s)mllxe{l-xe). 

Lc|Il;n], |L|>1 ieL 

Proof of Proposition 6.1. Let Qnfi denote the generator of the limiting diffusion in the random 
mating case (sjj = for every i,j G A). The diffusion associated with this generator is ergodic 
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and has a reversible stationary distribution m^^o which is the product of Beta distributions: 
nif^fl := (Beta(2/io, 2/ii))®". In the general case, the generator Qn,s 

can be decomposed as 

1 " 

Gn,s = Gn,0 + 2 X] ^"-^^ ~ ^i)dih{x)di 

4 = 1 

where 

h{x) = 2l^l-i<5i[m(s)](0) n ^£(1 - ^i)- 

Lc[l;nl, |L|>1 i&L 

Therefore, as explained in Ethier & Nagylaki (1989), we can apply a result of Fukushima 
& Stroock (1986) to deduce that the diffusion associated with Qn^s has a unique reversible 
stationary distribution m^^s given by 

■m^^s{dx) = C exp{h{x))m^fl{dx), 

where C is chosen so that m^^s is a probability distribution. □ 

6.2 Description of the density of the stationary measure 

We analyse the density of the stationary measure under two supplementary assumptions: 

Assumption H6: The two mutation rates //q and ni are assumed to he equal to a strictly 
positive real number 

Assumption H7: For every L G "Pdl ;?i-]]), 'rni{s) depends only on \L\. We write m(i) for 
the common value of mils) for L G V{ll ;n|) such that \L\ = £. 

Assumption H7 holds if the assortment parameters satisfy the Hamming criterion or the 
additive criterion. 

Under the hypotheses HI, H2, H3, H4, H6 and H7, the density of the invariant measure 
can be written as gn,ii,s{x) = C ex.p{hn^^^six)) with 

n n—1 

hn,f,,s{x) = {2fi-l)Y\n{p{xi)) + J2'^i Ylpi^k), 

1=1 e=0 Lcll;n}, |Lj=£+lfcGL 

where p{xi) = Xi{l — Xi) and = 2^(5(^+-^) [m](0). 

The study of the invariant measure in the one-locus case already provides a precise image 
of the graph of 5n,^,s when the n coordinates of the diffusion are independent, that is when the 
assortment coefficients are chosen so that 



for every £ G {0, . . . , n — 1}, m{l + 1) — m{tj = m(l) — m(0). 



A. Etheridge and S. Lemaire 



26 



There are then at least four different types of graph depending on the respective contributions to 
allehc diversity of mutations (/i > 1/2 or < ;U < 1/2) and assortment parameters (m(l) — m(0) 
smaher than |8^ — 4| or not) as shown in Fig. 1. 

Proposition 6.2 gives conditions on the assortment parameters under which (1/2, . . . , 1/2) 
is the only critical point of the density, as in the random mating case. Proposition 6.3 deals 
with situations far from the random mating case (the proofs are postponed to §6.4). 

Proposition 6.2. We assume that the hypotheses HI, H2, H3, H4, H6 and H7 hold. Set 
K = 2/. - 1 + 2-("+i) ZlZl n')6'^'nmm. 

1. IfVn> 0, then (1/2, . . . , 1/2) is a local maximum of Qn.^^s- 

2. IfVn< 0, then (1/2, . . . , 1/2) is a local minimum of gn,fj.,s- 

3. If fi> 1/2 and if 6'^'''> [m]{i) > -{Sfi-i) V£ G [[0;n-l]l, then (1/2, . . . , 1/2) is a global 
maximum and is the only critical point of gn,fi,s- 

4. IfO<fM< 1/2 and if 6'^^^[m]{i) < -(8/i - 4) G [[0;n-ll, then (1/2, . . . , 1/2) is a 
global minimum and is the only critical point of gn,^,s ■ 

Example 6.1. Let us consider the additive criterion with the assortment sequence si = bi for 
£ £ [0 ; n] . Then 5^^^ [m] {£) = (^^2) ^^{e is even} • As 2~^ (^^2) is a strictly decreasing sequence 
smaller than 1, 6 < implies Vn > 2fi — 1 + |6. Thus, it follows from Proposition 6.2 that 
if /i > 1/2 and b > —8{2fi — 1), the point (1/2, . . . , 1/2) is a local maximum of gn,s,fj,- Let us 
note that if we consider the same sequence si = b£ but with the Hamming criterion, then for 
fi > 1/2 and b < —4(2^ — 1), (1/2, . . . , 1/2) is a local minimum of gn,s,^i- 

Remark 6.2. The statement of Proposition 6.2 can be easily extended to a family of assortment 
parameters for which H7 does not hold: V„ must be replaced by 

K,^ = 2/i - 1 + 2-("+i) ^ 6i[m{s)]iB) 

BCll;nI\{i} 

for every i S [1 ; f^l and the conditions on 5[m](^) in assertions 3 and 4 are replaced by a 
condition on 6i[m{s)]{A) for every i G [I ; nj and ^ G P([l ; nj \ {i}). 

The following proposition describes the properties of the critical points of the density in 
two cases, (1) /i > 1/2 and a condition on the assortment parameters which strongly favours 
mating between individuals carrying similar types: 

5(1) [m] (n - 1) < <5(^^ H (n - 2) < . . . < (J^^) [m] (0) < and J^^^ [m] (n - 2) < 0, 
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and (2) < /X < 1/2 and a condition on the assortment parameters which strongly favours 
mating between individuals with dissimilar types: 

S'-^^ [m]{n-l)> 6^^^ [m] (n - 2) > . . . > d'-^^ [m] (0) > and J^^^ [m] (n - 2) > 0. 

To simplify the statement, the description is limited to the hypercube [0, 1/2]". The description 
on the whole space [0, 1]" can be deduced from this since gn,^,s{x) is invariant if we replace any 
coordinate Xj with 1 — Xj. 

Proposition 6.3. Assume that conditions HI, H2, H3, H4, H6 and H7 hold. Set 

K = 2/. - 1 + 2-("+i) J2 : ^)6^'n^m. 

k=0 ^ ^ 

1. Case fi > 1/2. Assume furthermore that: 

(5(i)[m](n- 1) < (5(i)[m](n- 2) < . . . < 5'^^^[m]{0) < and 5'^^^[m]{n - 2) < 0. 

(a) If Vn > then (1/2, . . . , 1/2) is a global maximum and is the only critical point of 
the density gn,^,s- 

(h) IfVn<0 then 

"i- gn,fi,s has a local minimum at (1/2, . . . , 1/2). 

ii. In [0,1/2]'", gn,fj.,s takes its maximum value at a unique point of the form 

(eo,...,Co). 

Hi. The other critical points of gn,^,s in' [0,1/2]" are saddle points: for every 
i ^ |1 ; n — 1], gn,fi,s has (") saddle points of index n — £ in [0, 1/2]". The saddle 
points of index n — i have i coordinates equal to 1 /2 and the other coordinates 
have the same value denoted by 

iv. The relative positions of the coordinates of the critical points in [0,1/2]" satisfy 

o<en-i<---<eo<i/2. 

V. The value of gn,ii.s is the same at any saddle point of index n — I and decreases 
as I increases. 

2. Case < f^t < 1/2. Assume furthermore that: 

(^(^)[m](n - 1) > 5(i)[m](n-2) > ... > 5^'^^[m]{0) >0 and 5^^^[m]{n-2) > 0. 

(a) If Vn < then (1/2, . . . , 1/2) is a global minimum and is the only critical point of 
the density gn,ti,s- 
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(b) IfVn>0 then 

i- gn,fi.s has a local maximum at (1/2, . . . , 1/2). 

ii. In [0,1/2]", gn,fi,s takes its minimum value at a unique point of the form 

(eo,...,^o). 

Hi. The other critical points of gn,fi,s in [0,1/2]" are saddle points: for every 
£ G [[l;n — Ij, gn,iM,s has (") saddle points of index £ in [0,1/2]". The sad- 
dle points of index I have I coordinates equal to 1/2 and the other coordinates 
have the same value denoted by ^i. 

iv. The relative positions of the coordinates of the critical points in [0,1/2]" satisfy 

o<en-i<---<eo<i/2. 

V. The value of gn,fi,s is the same at any saddle point of index n — i and increases 
as i increases. 



Remark 6.3. 



1. ^0 = 1/2- 1/2 VI - 4Ao where Aq is the unique solution in ]0, l/4[ of the equation: 
2/i- l + x^<5W[m](i)P , j(2x)^(l-2x)"-i-^^ =0 



More generaUy, for every i £ [[0;n — l]],^^ = l/2 — l/2\/l — AXi where Xi is the unique 
solution in ]0, l/4[ of the equation: 



n-l 



2ij - 1 + xY,Bn-iM'^^)^^^Hm]{i) = (Se) 



i=0 

min(j,£) 



and B.,o(x) = 2-' C) (l_ - ^T-'-^'-'^- 

Let US note that (i?n,^,j(3;))j=o,...,n are positive on ]0, 1[ and their sum is equal to 1. 

2. The assumption that 5^^\m\{i) is a decreasing function of i cannot be removed since 
one can find examples of assortment parameters satisfying 5^^^[?n](i) < for every 
is |[0 ; n — Ij and such that: 

(a) ^ > 1/2, Vn > 0, but (1/2, . . . , 1/2) is not the only local maximum, 

(b) Vn < and gn,ij,,s has more than 2" local maxima. 
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3. If Xi is the proportion of the population with allele at the i-th locus, 2xi{l — Xi) is the 
probability that two individuals sampled at random from the population carry different 
alleles at the ith locus. The density function of the reversible measure takes its maximum 
value at a point x such that for each i G [1 ; — xi) = Aq- 

Example 6.2. Let us consider a quadratic sequence of parameters = sq — {M + c£^) 
yi £ [0 ; nj and let us define the assortment with this sequence by means of the Hamming crite- 
rion. If c > 0, b+c > and /x > 1/2 then gn,fj,,s has 3" critical points if and only if b+nc > 4. 
In this case, Aq = 77.""^^^ y^ ^^""^ + 0{n^^). If h^^k denotes the value of the function h^^^^s at a 
critical point of index n — /c then /i„ o~^n n ~ §77^ and /i„ o~^n i ~ n^/'^l/2^J'c{2^l^^\) 
(see Appendix A. 2 for more details). 

6.3 Graphs of the density and simulations of trajectories in the two and 
three locus cases 

Figures 2 to 4 show graphs of the density of the reversible stationary measure in the two- 
locus case for /x = 0.6 and for several values of si — sq and S2 — ^i, the assortative mating being 
defined by the Hamming distance. Figures 2 and 3 illustrate the two situations considered in 
Proposition 6.3 when /i > 1/2. When si — sq = 0, the density may have a continuum of critical 
points as in Fig. 4; this corresponds to a case in which the assumption d^^^ [m] (n — 2) < of 
Proposition 6.3 is not satisfied. 

To illustrate the evolution of the 0-allelic frequency when /x > 1/2 and the assortative mat- 
ing strongly favours pairing between similar types, simulations were run in a population of size 

= 10^ with the two- locus model (Fig. 5) and with the three- locus model (Fig. 6). For these 
simulations, every individual initially carries the allele at every locus, recombination occurs 
independently at each locus and the assortative mating is defined by the Hamming criterion. 
The trajectory is plotted at intervals of size N between the iterations A^ and 33 A^. To help 
to visualize the evolution, the colour of the plot changes every ^A^ iterations. The form of 
the density of the stationary measure here is highly reminiscent of that of the fitness land- 
scapes studied in the adaptive evolution literature in modelling additive traits under frequency 
dependent intraspecific competition, see e.g. Schneider (2007) and references therein. In the 
deterministic setting the existence of multiple 'long term equilibria' renders the behaviour of 
the system very sensitive to assumptions about the initial conditions. In our setting, the pres- 
ence of genetic drift is sufficient for the population to (eventually) explore the neighbourhoods 
of all the maxima, irrespective of its starting point. The time spent by the population in the 
neighbourhood of a maximum depends on the assortment parameters (Fig. 6a and 6b). 
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Figure 2: Graph of g2,^,s when = 0.6, 
si — So = —0.4 and S2 — si = —0.6 so that 
the point (1/2, 1/2) is the only critical point 
of the density 52,8,/^- 




Figure 4: Graph of 52,^,s when = 0.6, 
si — So = and S2 — si = —12; there is a 
continuum of critical points. 




Figure 3: Graph of g2,ij,,s when = 0.6, 
si — So = —2 and S2 — si = —6 so that 
Ao — 0.0766. A black dot marks the posi- 
tion of each extremum and a cross is plotted 
at each saddle point. 




Figure 5: Simulation of the evolution of the 
0-allelic frequency in the two-locus model. 
The population size is = 10'^, /x = 1, 
si — So = —15, S2 — si = —210. A black dot 
marks the position of each extremum and a 
cross is plotted at each saddle point. In this 
example, Ao ^ 0.034 and Ai ~ 0.008. 
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Figure 6: Simulations of the evolution of the 0-allelic frequency with the three-locus model 
for two different sets of assortment parameters. The assortative mating favours more strongly 
pairing between similar types in Fig. 6b. The size of the population is = 10^ and the 
mutation rate is fi = 1. A black dot marks the position of each global maximum of the 
stationary density, a cross the position of each saddle point of index 2 and a diamond the 
position of each saddle point of index 1. Some numerical characteristics of the stationary 
density are presented to the right of each figure: for i G {1, 2, 3}, the value of Aj = — ^j) 
provides the position of the critical points of index 3 — i (see Proposition 6.3) and hi is the 
value of the log-density /in,^,s at a critical point of index 3 — i. 
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6.4 Proofs of Propositions 6.2 and 6.3 

Proof of Proposition 6.2 Let us introduce some notation in order to sliorten tlie expres- 
sions. We set 1/ = 2/i — 1, p(n) = u{l — u) for u € [0, 1], p{x) = (p(xi), . . . , p{xn)), 

n 1 " 

i=l e=l iC[l;n], \L\=e jeL feG[[l;n|\L 

and = h{p{x)) for X = €]0, Witii tliis notation, (2;) = C'n.^.s exp(/i(x)). 

1. For every x g]0, 1[" and « G [1 ; dih{x) = (1 — 2xi)dih{p{x)) where 

* ^=0 Lc[l;n.I\{i},|L|=£ jeL fcG[[l;n]\(LU{i}) 

First, tiie point Un = (1/2, . . . , 1/2) is a critical point of /in,/i,s and the Hessian matrix at 
this point is the diagonal matrix — 2A/„ where 

A = 4Z. + 2-("-i) 7 ^] 5(1) [m] ii) = Wn. 

This proves the first two assertions of the proposition. 

2. The last two assertions follow from the fact that dih{x) and A are increasing functions 
of <5(i)H(£) for every i. Let us prove assertion 3 to illustrate the method. First, if 
(^W[m(s)](£) = -(8^-4) for every £ E [[0 ; n — Ij then the n coordinates of the diffusion 
are independent. In this case, A = and the stationary density has only one critical 
point at (1/2, . . . , 1 /2) which is a maximum. If {sj j- G A'^} is a family of assortment 
parameters such that dih{x) is nonnegative for every x g]0, 1/4]" and the density gn,s,^l has 
a unique critical point at (1/2, . . . , 1/2) which is a maximum, then the same is true for any 
family of assortment parameters (i,j) G J?} such that 5'^^\m{s)\{l) > 6^^^ [771(3)] (i) 
for every £ £ [[0 ; n — IJ . 

Proof of Proposition 6.3 We retain the notation introduced in the proof of Proposition 6.2. 
For k E |l;?i-l, we set = 2^6^^~^^^[m]{0) and denote by Cn^k the elementary symmetric 
polynomial function in n variables of degree k: 

enfiix) = 1 and en,kix) = for k £ [1 ;n]. 

\L\=k 
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For instance, e„,i(x) = xi + . . . + Xn, en,2{x) = Yji<i<j<n^i^r 
With this notation 

n n— 1 

h{x) = z/^ln(xi) + y^^ai>en/+i{x). 

1=1 l=Q 

In the proof we shaU use (several times) the fohowing identity for elementary symmetric poly- 
nomial functions: 

Lemma 6.1. Let n be an integer greater than 1 and let k £ [[0 ; n — 2]]. For every x £ IR", set 
x(') = (xi, . . . . . . ,Xn) for i £ |1 and 

^ ^ (-3,^^ _ _ _ . . . ,Xj_i,Xj+i, ...,Xn) for i,j G |1 ; such that i < j. 

Then, 

Xie„_i,fc(xW) - Xjen-i,k{x^^'^) = [xi - Xj)e„_2,fc(^^*'^^)- (6.1) 

We shall also use the following alternative expression for symmetric polynomial functions 
that are similar to the polynomial term in h: 

Lemma 6.2. Let n G IN* and let oq, . . . ,an be real numbers. Then for every x G IR", 

n n 

^ 2'=5('=) [a](0)e„,fc(x) = ^ a, J] 11 II^l " 

fc=0 i=0 /C[l;nl, \I\=i i&I j0 

In particular, for every y G IR and i £ [0 ; n] , 

n n 

5^2'=^«[a](0)e„,fc((l/4)®^y«("-^)) = a,i3„,,,,(2y) 

fc=0 1=0 
min(i,i 



n-£-{i-j) 



Where B^^y) = 2"^ T^] ('^_ ^] y-^(l - y) 

Proof See Corollary A. 2. □ 

1. Let us assume that x = (xi, . . . critical point of gn,fi,s different from m„,. Let 

i denote the number of coordinates equal to 1/2 {£ £ lO;n — Ij). Every coordinate Xi 
different from 1/2 has to satisfy: dihn^^^s{p{x)) = 0, that is 

n-l 

V + p{xi)2_^aken-i,k{p{x) ) = 0. 
fc=o 
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In particular, it follows from Lemma 6.1 that if Xi and Xj are two coordinates of the 
critical point x not equal to 1/2 then 

n-2 

E- (i,]) 
aken-2,k{p{x) ) = 0. 

By Lemma 6.2, 

n-2 n-2 
k=0 £=0 

where Qi denotes a polynomial function which is positive on x g]0, l/4["~^ for every 
£ £ [[0 ; n — 2|. Thus this sum cannot vanish in ]0, l/4["~^ under the assumption that all 
coefficients (^'^^•' [m] (i) have the same sign and that for at least one i < n — 2, 6^^'^[m]{i) 
is non-zero. Therefore, such a critical point exists only if there exists a solution in the 
interval ]0, l/4[ of 

In order to study the solutions of (S^), let (j)£{y) denote the left-hand side of (<?^): 

n— 1 ^ 

My) = ^^ + 2/^afce„_l,fc((-)®^y®(-^-l)) (6.2) 

A;=0 

By Lemma 6.2, 

n-l 

•^Ky) = 1^ + 2/ J]^n-i/,^(2y)<5(i)H(i). (6.3) 
Therefore, (S^^) coincides with (Si) of Remark 6.3. The derivative of (pi is equal to: 

n-l 

<^k2/) = E^«-i.^.^(2y)5«H(i) 
i=0 

n-2 

+ 2y{n -1-£)Y^ Bn-2Aii'^y)i^^'H^]{i + 1) " <5^'^H(i)). 
j=0 

If(5(^)[m](n-1) < ••• < 5(^)H(0) < (respectively J(^)[m] (n-l) > ••• > 6^'^^[m]{0) > 0), 
(pi is a decreasing function on the interval [0, 1/2] (resp. an increasing function on the 
interval [0, 1/2]). The value of (j)£ at is and the value at 1/4 is Vn- Therefore, under 
the assumptions of 1 or 2 of the proposition, for every i £ {0, . . . ,n — 1} (iS^) has no 
solution in ]0, l/4[ if and u have the same sign and has exactly one solution in ]0, l/4[ 
denoted by if Vn and u have opposite signs. This proves assertions l.(a) and 2. (a). 
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For every pair of disjoint subsets / and J of [1 ; J^J, let us introduce the following point: 
ui,J = (xi, . . . ,Xn) with 



1/2 if i G /, 



1/2 + 1/2^1 -4A|7| Hie J, 



[1/2- 1/2^1 -4A|,| ifiG |l;?il\(/U J). 

We have shown that if Vn and have opposite signs, then every point uj^j is a critical 
point and any critical point is one of these points 



So that we may use our conclusions above, from now on, we assume that the hypotheses stated 
in point 1 of the proposition are satisfied. However, the computations that follow do not depend 
on these hypotheses, and so our proof is easily modified to the setting of point 2. 

2. Let us study the Hessian matrix of hn,^,s at a critical point uj^j such that \I\ < n — 1. 
For that, set I = l'^ = \ J\ and l~ = n — I — and let us introduce the following 
notations: 



a, = 5lM(i)®^(A,f("-^)), 

Q = (l-4A,)a2 /,((l)®^,(A,f(«-^)) 



he 



:i-4A,)-2, 



The Hessian matrix oihn us at ui j is permutation-similar to the following block matrix: 



(Ap \ 



B 



Co 



Ce B 



where 



• Ae denotes the scalar matrix —2aele with ae = (A^)®^" ^)), 

^be Ce ■■■ ce\ 

Ce ' ' ■ ' ' ■ '■ 

• Be^k denotes the following A:-by-fc matrix : Be^k = 

: ■•• ■•. ce 

\ce ••• Ce be/ 

• Ce denotes the i~^-hy-i~ matrix all the elements of which are equal to — q. 
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is negative definite (for more details, see Lemma A. 2) lience tliat the 



By assumption on fi, bi < 0. To complete the proof of assertions (i) and (ii) of l-(b), 
we shall prove that < and that bi < ci < 0. That will imply that the submatrix 

\ Ci Bi^i ^ 

Hessian matrix of hn,^i,s at a point has |7| positive eigenvalues and n — |/| negative 
eigenvalues. 

First, let us study the sign of = ^u + YJlZo aien-i,i{{\T^'~^\>^f''~''^)- As cj^iiM) = 0, 
an application of Lemma 6.1 yields: 



n-2 

a, = (1 - 4A,) Y: a.e._2,((i)^(^-i), Af ("-^-^)). 



1=0 

The right-hand side can be rewritten using Lemma 6.2: 

n-2 



ai 



(1 - 4A,)^5W[m](0B„_2,£_i,i(2A,). 



i=0 



The conditions on (5(^)[m](i) imply that is negative. 

Let us now study the coefficients 5^ = (1 — 'iXi)~^bi and q = (1 — 4A£)^^C£. As in the 
study of ai we use that 4>t{\i) = and Lemma 6.2 to write bi and ci in terms of the 
coefficients 5^^\rn\{s){i): 

^ n—l 
n-2 

Ci = 2 J^(<5(i)H(i + 1) - 5(^)[m](f))i?„„2,A.(2A,). 

i=0 

As (5(^)[m(s)](i) is assumed to be a decreasing sequence, q < 0. After some computations, 
we obtain: 

n-2 



AKq - h) = - ^5(i)H(i)S„„2,Ai(2A, 



1=0 



The conditions on 5^^'^[m\{i) imply that q > bi. 



3. Let us prove that < An-i < • • • < Aq < 1/4 , which gives the relative positions of the 
coordinates of the critical points. 

Let £ G [[0;n — 2]]. If we return to the expression (6.2) of (pi, use Lemma 6.1 and then 
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Lemma 6.2, we obtain: 

71-2 

- My) = - 2/) 5]a.+ie„_2,((l/4)®^y^("-2-^)) 

i=0 
n-2 

= 2y{l/4 -y)Y, ^^^^ N(i)S„_2,£,i(2y). 

i=0 

By assumption, 6^'^')[ni]{i) < for every i G [0;n — 2] hence (pi^i{y) < 4>£{y) for every 
y € [0, 1/4]. As the functions (pi are decreasing on [0, 1/4], we deduce that A^+i < for 
every £ £ [0;n — 2|. As the two critical points wp.^i and ^11.^4.110 have not the same 
properties, they cannot coincide and thus A^+i < Xi for every £ £ lO;n — 2}. 

4. Proof of assertion l.{b).v: let hi denote the value of hn^^^s at a saddle point of index n — £: 
Pf, = ((1/2)®^ To prove that hi > h^+i for every ^ G [0;n - 2], we shall use 

the properties of the gradient dynamical system = —Vh{x) with h = —hn,^,s- Fix 
a positive value M large enough so that Um = M\) contains all critical points 

of h (such an M exists since h{x) tends to infinity as x tends to the boundary of [0, 1]"). 
The function h decreases along trajectories and a trajectory of a point x £ M converges 
to a critical point of /i as t tends towards +00, since h has only isolated critical points. 
For /c G {0, . . . , n — 1}, let i7|^'' denote the subset: 

f/JJ) = {x £ Um, xi = • • • = Xfc = 1/2 and Xi < 1/2 Vi > k}. 

Every subset U^^^ contains exactly one critical point, the saddle point p^. As dih{x) = 
at points x such that Xi = 1/2, the subset Uj^^ is positively invariant by the gradient flow. 
Therefore, to prove that > /ifc+i, it is enough to show that there exists < yo < 1/2 
such that for y £]yo,l/2[, /^((l/2)®^ y, ^f^f ^"i) < h{pk+i). 

As /^((l/2)®^y,4«_,Y'=-l) = - /^„,^,,((l/4)®^ y(l - y), Af^r'"')> it is enough to show 
that Af'j"-'^"'^) < 0. Using that Xk+ 1 is solution of the equation 

{£k+i)i we obtain 

n-2 

5fc+iV^,.((l/4)^('=+i),Afjr'"'^) = (1 -4Afe+i) j;5(i)[m](i)S„„2,M(2y) < 0. 

1=0 

7 Proof of convergence to the diffusion 

In this section, we prove convergence to the diffusion approximation in the n-locus case 
(Theorem 4.1). We also establish the two simple expressions for the drift presented in §4. 
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First, the properties of the generator Qn,s stated in assertion (a) of Theorem 4.1 can be 
obtained by applying the following theorem established by Cerrai and Clement: 

Theorem 7.1 (Cerrai &: Clement 2004). Lei cS"*"(]R") be the space of symmetric, non-negative 
definite, n x n matrices. Let A : [0, 1]*^ — )• 5~'"(IR") and b : [0, 1]" — )■ IR" be mappings of class 
C"^ . For i G {l,...,n} and e € {0,1}, let denote the unit inward normal vector of the 
hypercube C* = {x G [0, 1]", Xi = e}. Let us assume the following two conditions: 

• for every i £ {1,. . . , n}, e £ {0, 1} and x G C*, A{x)i'l{x) = and {b{x),vl{x)) > 0; 

• for every i,j G {1, . . . ,n}, Aij{x) depends only on Xi and xj. 

Then the operator 



is closable in C([0,1]"') and its closure is the generator of a strongly continuous semigroup of 
contractions. 

To prove the convergence result, we use the following theorem, due to Ethier and Nagylaki, 
on diffusion approximations for Markov chains with two time scales. 

Theorem 7.2 (Ethier & Nagylaki 1980, Theorem 3.3). For iV G M*, let {Z^ , A; G M} 6e a 

homogeneous Markov chain in a metric space with Feller transition function. Let Fi and 
F2 be compact convex subsets o/lR"" and IR™ respectively, having non-empty interiors. Assume 



further that £ F2. Let : Ejy — t- Fi and : — F2 be continuous functions. Define 
= ^n{Z^) and = '^^{Zj^) for each k G IN. Let {6^)^ and {6n)n be two positive 
sequences such that Jat — t- and sn/^n 0. 

Assume that there exist continuous functions a : FiX IR™ —?■ IR" (g) IR", b : FiX IR"^ — ^ IR" and 
c : Fi X IR™ — )• IR™ such that for i,j G [1 ;^^] and ^ G [1 ; m] the following properties (a)-(e) 
hold as N ^ +00 uniformly in z £ Ej^f where x = <&7v(2) and y = '^^{z): 




o 



(a) e]^' ]E,[Xf (i) - x{i)] = h{x,y) + o(l) 



(b) 6]^i]E, [(Xf(i)-x(.))(Xf(j)-^ 



m 



ai,j{x,y) + 0(1) 



(c) e],'m,[{X^{i)-x{i))^] 



o(l) 



(d) 6^''E,[Yf{£)-yii)]=cdx,y) + o{l) 
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(e) 6],'^.[{Yl'{l)-y{m = o{l). 
Assume further that 

(f) c is of class C"^ , c(x,0) = for all x G IR'" and the solution of the differential equation 

^u{t, X, y) = c{x, u{t, x,y)), u{0, x, y) = y. 
exists for all {t,x,y) G [0, +oo[xFi x F2 and satisfies 

lim sup \u{t,x,y)\ = 0. 

*^+°° {x,y)eFixF2 

(g) The closure of the following operator 

^ = 2 E «m(^>0)— — +5]6i(x,0) — , V{C) = C\F,), 

i,j=l i=l ^' 

generates a strongly continuous semigroup on C{Fi) corresponding to a diffusion process 
X in Fi. 

Then the following conclusions in which the symbol =^ denotes convergence in distribution, 
hold: 

(i) IfX^ X{0) then {Xf^^^^t > 0} X{-) in Di^, ([0, +oo[) (where L>Fi ([0, +oo[) is the 
space of cddldg paths uj : [0,oo) — )• Fi with the Skorohod topology), 

(a) For every positive sequence {tN)j\f that converges to +00, ^[^/^^j =^ 0. 

Remark 7.1. We have only stated the part of Ethier and Nagylaki's theorem that we need. 
The full statement also gives a convergence result when the sequence {6n)n converges to a 
positive real number. 

To apply this theorem, we consider the two sequences e^v = N~'^ and 6n = A^~^, we set 
En = {z € (A^'^IN)"^, X]^g_4 2;(i) = 1}, and we define by {^n,^n) a change of coordinates 
such that ^'^^({0}) is the linkage equilibrium manifold: 

$jv: En [0,1]" and ^-^v : En ^ [-1, 

z I—)- (ui, . . . , Un) z I—)- (u/, / C [1 ; nj s. t. \I\ > 2) 

where Ui = J2£,ei=o '^(■^) ^^"^ ^ ^ [Ij'^l and uj = Hie/ ~ ^£,^1^=0 •^(•^) ^^'^^^ ^ ^ [Ij'^l 
having at least two elements. 
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First (in §7.1), we shall check that x[^^ = ^n{z[^^) and Y^^^ = ^jv(z{^^) satisfy the 
conditions (a)-(f) of Ethier and Nagylaki's theorem with the following expressions for the 
functions aij{x,0) and bi{x,0): 

aij{x,0) = - x{i)) t{i=j}, (7.1) 

bi{x, 0) = (1 - x(z))^i - x{i)no + (1/2 - x{i))x{i){l - x{i))Pi4x), (7.2) 

where 

PiA^) = Yl isju{i},H - sj,h) 

^C[l;nl\{i}//C[l;n]\{i} 

iix{j)iixih) n (i-^(i)) n (i-^w)' 

jgj h£H je|Il;nl, hG[l;nI, 

j^JU{i} h^HU{i} 

and, for two subsets / and J of [1 ; nj, sj^j denotes the assortment parameter Sij for the types 
i = (0/,lj) and J = (Oj,lj). 

In §7.2 we shall show that Pi^s has the following two equivalent expressions: 
PUx) = Yl 2\^^SAu{^}[m{s)mllx{i){l - x{i)) 

Ac[l;nl\{i} leA 

= Y S^Ms)ml[2x{k){l-x{k)) H [l-2xmi-x{£))). 

Ac[l;n]\{i} keA e<^AU{i} 

7.1 Verification of the conditions (a)-(f) of Ethier and Nagylaki's theorem 

As the proportion of individuals of a given type i can only change by zizl/N in one step: 

• If r G IN* and i £ A, then 

lE^izi^'^i) - z{i)Y] = N-'- Y {fN{z,j,i) + i-lYfN{z,i,j)) (7.3) 

i6-4\{i} 

• if r, n G IN*, i,j £ A so that i ^ j, then 

[{zi''\i) - zii)nzi''\j) - z{j)r] 

= N-('+^')(^{-lYMz,i,j) + {-irMz,j,i)) (7.4) 

• if r > 3 and ■ ■ G ^ so that at least three of them are distinct, then 

IE, [ f[ (Zj^) (*(")) - z(i(")))] = 0. (7.5) 

u=l 
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Condition (a). To show that condition (a) of Theorem 7.2 holds, we first examine the drift 
of Z(^). A Taylor expansion of the transition probabilities of the Markov chain {zj:^^)t£fq 
using assumption H2 yields the following formula: 

Lemma 7.1. For every i ^ A, 

iV^ ]E^[z[^\i) - z{i)] = NBf\z) + Bf\z) + 0(iV"^), uniformly on z e E^, 

where 

n n 

keAjeA u=i u=i 

+ X! X] ^3,kz{3)z{k)q{{j, k); i) - z{i) ^ Si^kz{k) 
keAj£A k&A 

~ ^3,hzU)z{h)z{k)q{{j,k); i) + z{i) ^ ^ Si^hz{k)z{h) 

keAjeAheA heAkeA 

Proof. By assumption H2, for two different types i,j G A 

fr,{z,i,j):= z{i)z{k)w^''\z,i,k)q{ii,k);i)i^(''\e,j). 
k,eeA 

where w^^^ {z, i,k) = l + ^ {si^k - EheA Si,hz{h)) + 0{N''^) and 



i^i-j, + 0(iV-2) if ^^^^j^ = 1 and £i = 1 - ji 



To prove Lemma 7.1, it suffices to use these expansions in 

IE, [Zf - zii)] = iV-i ^ (/^(z, - fNiz,i,j: 

and to simplify. □ 

Let n G Jl;n]. To establish an expression for the drift of X(^)(ti), we must compute 
T.i^A,^^=oBf\z) and E ieyl,iu=o-^i \'^)- Direct computations yield: 
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Lemma 7.2. For every u £ ll;n} and z G E^, 

E bP{z)=0, (7.6) 
ieA, iu=0 

^ b\'\z) = (1 - - x(n)/xo + (7.7) 

ieA, iu=0 

where 

^iu) = Yl "''^^ = XI X] ^(■?)^(^)*i,'*(I{iu=o} -2;(n)). 

iG.4, i„=0 jG»4 h€A 

Proof. For e G {0, 1} and i € A, let au\i) denote the type i modified by setting the ahele e at 
the locus u. We shall use the following formula several times: 

q{{j,k)-a'i:\i)) = +f(n)(I|fc,„=,} -%„=,}) (7.8) 

i&A, iu=0 

with r{u) = J2icii-M\M = ^ ^y assumption HI. 
First, formula (7.8) with e = provides 

Y Bf\z)= <3) + r{u)YY.^h^u=e}-\,.=.})- E ^W = 0- 

i&A, iu=o jeA, ju=o jeAkeA ieA, iu=o 

Let B^^'''\z) denote the j-th line of the expression of B^^\z) for j G {1,2,3}. 
As ^((i) does not depend on the value of a if u / 

ieA, iu=0,ix=a 

Y b\'''\z) = YY<3)z{k) Y {q{{3,ky,a^^\i})^^, - q{{j,k)■a^^\^))^^o). 
ieA, iu=o keAjeA ieA, iu=o 

Applying (7.8) again, we obtain: 

Y bI^'^Hz) = {'^ - x{u))ni - x{u)fio- 
ieA, iu=0 

Due to the symmetry of the parameters: Sij = Sj^i for i,j G A, we have: 

Y ^P^(-) = o. 

ieA, iu=0 

Finally, computations using (7.8) yet again yield: 

Y = ^G„(z). 

ieA, iu=0 

□ 
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To obtain condition (a), it remains to express Gu{z) in the new coordinates. The following 
lemma describes the inverse of the change of coordinates (<I>jv, 'I'/vr): 

Lemma 7.3. For z £ Ej^f and L C set x{L) = j|^=0'^(*) ^^^^ convention 

x(0) = 1 and y{L) = OteL ^(^) " ^(■^) ^/l-^l ^ 2. Then for every J C [[l;n], 

z(Oj,lj)=n^«n(l-^(^))- E (7.9) 

ieJ jgJ /C[l;n] s. t. Jcl, \I\>2 

Proof. First, by induction on n — | J|, we show that 

z(Oj,lj)= Yl (7.10) 

/C[l;n| s. t. JCI 

Since z{0) = x(Jl ;n]), the equality (7.10) holds for J = [1 

Let m £ ll;nj. Assume that the formula (7.10) holds for every subset J of [[1 ;n| such that 
I J| > m. Let K he a. subset of [1 ; nj with m — 1 elements. 

Lc[l;n.I s. t. KCL 

We apply the formula (7.10) to every term in the sum and we invert the double sum we have 
obtained: 

z{OK,lK) = ^iK)- E x{H)[ Y (-l)l^l-l^l). 

The sum between parentheses is equal to 

/\m \K\\ 

Y (-i)l^^l-l^l-W 1^1 ~ 1^1 1 = _(_i)l^^l-l^l. 

v=l \ ^ / 

Thus the formula (7.10) is also satisfied for the subset K which completes the induction. 

To complete the proof, we replace x{I) in (7.10) with Yii^i ^(^) ~ ui^) 1°^ every subset / having 

at least two elements and use the following equality: 

Y (-i)i'i-i^in^«=n^(j')( E (-i)'^'n^w) =n^(j') n a-^w)- 

/Cll;nl, JCI i&I j&J LC[l;n]\J ieL j€J je[l;n]\J 

□ 

To shorten the notation, set 
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• A„ = [1 ; nl \ {u} for u G [1 ; 

• Ilj{v) = Ylj^jv{j) for V € [0, 1]" and J G 'P([l l^^]) with the usual convention IIq = 1, 

• s/,J = Sij for i = (0/, Ij) and j = (Oj, Ij). 
With this notation, for every J C A^, 

. z(Oj, Ij) = (1 - x{u))Ujix)U^^\j{l -x)- Rj{y), 

• 2^(0juW> IjUf:^) = a:^(^^)nj(2;)nA„\j(l - x) - Rjvj{u}{y), 

where R,j{y) and -Rju{M}(y) denote polynomial functions that vanish at y = 0. Therefore, 

G^{z) = x{u){i-x{u)) E nj(x)nH(x)nA„y(i-x)nA„\^(i-x)x 

(^x(u)(sju{„},HuM - sj^Hvj{u]) + (1 - 2;(^^))(sju{«},/i- - + Ru{x, y), 

where Ru{x, y) is a polynomial function in the variables x(l), . . . , x{n) and for / C [1 ; ra] 
such that |/| > 2, that vanishes in the equilibrium manifold: Ru{x,0) = 0. 
The expression for Gu{z) can be simplified by using the two assumptions H4 on the assortment 
parameters, that is sj^h = sh,j for every J, if C [l ; and Sju{m}, = •s J,H for every 
14 G [1 ;n] and J,ii C A„: 

= (1 - 2x{u))x{u){l - x(n))x 

In summary, we have established the following expansion of the drift of X^: 
Lemma 7.4. Assume that hypotheses HI, H2, H3 and H4 hold. For every i G |1 ;n], 

IE,[XW (i) - x{i)] = (1 - x{i))fi, - x{t)fio 

+ (1 _ xii))x{{){l - x(i))P,,,(x) + Ri{x, y) + 0{N~^) (7.11) 

uniformly on z € where 

PiA^) = E E nj(2;)n^^(x)nA^y(l - x)nA,\j/(l - a;)(sju{n},// - sj^h) 

and Ri{x, y) is a polynomial function in the variables x(l), . . . , x{n) and y{I) for I G V{\1 ; nj) 
with at least two elements such that Ri{x,0) = 0. 
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Condition (b). Computations similar to those used to obtain (7.6) lead to the following 
expansion of the second moments of x[^^ — x, showing that condition (b) holds: 

Lemma 7.5. N^'E, [{x[^\i) - x{i)){x[^\j) - x{j))] = a,j{x,y) + 0{N-^), with 
ai4x,y) = xii)il - x{i)) + OiN-^) 



j{x,y) = -2[ Z rj)yi{i,j}) + OiN-') ifi^j 

Vc[l;nl\{i,i} ^ 



uniformly on z £ E^. 

Proof. Let i,j E [1 ;?^] and z G E]^. By definition of X^^\ 
iV^E, [{X^^\i) - xmX^^\j) - xij))] 



E E ^4(zrik)-z{mz[''\£)-z{i))] 

k€A, ki=o eeA, f,=o 



Using formulae (7.3) and (7.4) and assumption H2, we obtain 
iV^lE, [(Xf )(i) - x(i))(xf )(i) - x{j))] 



Yl ^) + ^'^))(%.=0,fc,=0} - I{fc,=0/,=0}) 

-(1) , T^(2) ^(3) ^(3) 



where 



teA ££A ki^A, ki=kj=0 

tGA keA, ki=kj=o leA 

^ff=E^(*) E E l{{i,t);k). 

teA £eA, lj=Q k&A, fe,=o 

With the convention x({i,j}) = x{i) if i = j, we have T^^- = x({i, j}) and it follows from 
assumption HI (r/ = rj for every I C Jl ; n]) that 

^ilj = 2;(i)x(i) + ^ r/( + ) (2;({i,i}) - x(i)x(i)) 

/Cll;n] 

= x(i)x(j) +2(| ^ ?-/)(2;({i,j}) -x(i)x(i)^, 

^Cll;nl\{ij} 

=x{{i,j}) + ( ^ r7)(x(i)x(j) -x({i,j})) = -(x(i)x(j) + x({i, j})) . 

^Cll;n]\{i} 
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Therefore, for every i,j G |1 ; f^]], 



^Cll;n]\{i,i} 

If i = j then x{{i,j}) — x{i)x{j) = x{i){l — x{i)) and ~ 2' ^ 

Condition (d). Let / be a subset of [I ; nj with at least two elements. To compute the 
drift of we use the following lemma and formulae (7.3), (7.4) and (7.5) describing the 

moments of z[^'^ — z. 

Lemma 7.6. Let J be a finite set. Consider two families of reals {aj,j G J} and {bj,j G J}. 
The following identity holds: 

jeJ j€J KCJ, Kj^d keK 1&J\K 

Computations yield: 

iviE,[y/^)(/)-y(/)]= n E ^f(^; 

«6/ ^6/\{i} j&A, ji=0 

- E Bf\z) + 0{N-^). (7.13) 
uniformly on z G E'at. As we have shown that X^jg^ j—o ^j''^(-^) = for every i G [l;re] 



(equation (7.6)), 



iVIE,[y/^)(/) - y(/)] = - Yl (^) + (7.14) 



uniformly on z G -Eat. 

Direct computations provide the following expression of the sum on the right-hand side of 
(7.14) using the variables x{L) = J2jeA j|i,=o^(-?') ^ ^ ^([1 ;"-!)• 

J2 Bf\z)= rL{x{IfM)x{Ir\L)-x{I)) (7.15) 

To obtain an expression for IE^[y/^^(/) - y{I)] in the new coordinates, it remains to replace 
each term x[L) for \L\ > 2 with O^gL ■^(^) ~ v{-^) (7.15). This leads to the following lemma 
and shows that condition (d) holds. 
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Lemma 7.7. For a subset I of [I ;n] having at least two elements, 

N^,[Yl''\l)-yiI)] = cn,i{x,y) + 0{N-') (7.16) 

where 



Cn,i{x,y) = -(^ rij - I{|j|>4} ^ rLy{I H L)y{I n L) 

LC[l;nl, -£'C[l;n],_ 

in//0, Lni#0 |/ni|>2,|/ni|>2 

+ %l>3} E (rL + rzMLni) n ^W- 

|/nL|>2,|/nL|>i 

Condition (f). The following lemma shows that the condition (f) holds under the assump- 
tion H3: 

Lemma 7.8. For two distinct loci k,£, let ri^^i denote the probability that the offspring does 
not inherit the genes at the loci k and I from the same parent, 

/C[l;ra|, fee/ and 10 

and set r{n) = min(r^. /j k,h & P i ^1 (^"i^d h ^ k). 

Ifr(n) > then the following system of differential equations 

fQ ^^ ^it,x,y) = Cnjix,Vnj{t,x,y)) . , , I rl ^ o 

iSn,i)< ^ VIc[l;n]s. i. |/|>2 

[ Vn,i{0,x,y) = y{I) 

has a unique solution Vn = {vn,l, C [l and \I\ > 2} which is of the form: 

Vn,i{t,x,y) = ex.p{-r{n)t)fn,i{t,x,y), 

where fnj is a continuous and bounded function on IR x[0, 1]" x [—1, i]2"-n-i gg h^q^i ^/^g ualue 
of fn,i{t,x,y) depends on x and y only via the coordinates x{i) for i £ I and y{J) for J C I 
such that |J| > 2. 

Remark 7.2. For every subset / C [1 ;?^] with two elements say k and I, 

= -rk/ Vn,i{t,x,y). 

Therefore if r(n) = then there exists a subset I of [1 ; nj with two elements such that 
Vn,iit^x,y) = y{I). Thus the assumption r(n) > is a necessary condition for the solution of 
{Sn,i) to converge to as t tends to +oo for any initial values. 
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Proof. Let n > 2 and let I C [1 ;?^] be such that |/| > 2. As Cnj{x,y) depends only on the 
coordinates x{i) for t ^ I and y{L) for L C I such that \L\ > 2, we shall prove by induction 
on the number of elements of / that for any J C /, {Sn,j) has a unique solution of the form 
Vn,j{t,x,y) = exp(— r(n)t)/„^j(t, X, y), where is a continuous and bounded function on 
]R X [0, 1]" X [—1, i]2"-n-i gyg]^ thai the value of fn,j{'t, x, y) depends on x and y only through 
the values of the coordinates x{j) for j £ J and y{L) for L C J such that |L| > 2. 

• If / has two elements say k and ^, then (Snj) is the following differential equation: 

f %^(*' ^' y) = ~^k,£ Vn,l{t, X, y) 

\ Vn,iiO,x,y) = y{I) 
It has a unique solution Vn,i{t,x,y) = y{I)e~^^'^^^ fnj{t,x,y) where 

fn,i{t,x,y) = e-('''-'-^^^^^'y{I). 
By assumption r{k,i) > r(2) > 0, hence the result holds. 

• Let 2 < m < n. Assume that the inductive hypothesis holds for any subsets J with m 
elements. Let I be a subset of [I ; nj with m + 1 elements. Then 



dVn,I 

dt 



{t, X, y) = -rivn,i{t, x,y) + e *''^"^c/(t, x, y) 



where rj = and 

iC[l;nJ_s. t. 

9{t, x,y) = - I{|/|>4} J2 rie''^''^"^ fn,inL{t, x, y)fn,inL{t, x, y) 

-LC[l;n] s. t. 

|/nL|>2,|/nL|>2 

+ ^{|^l>3} X] {rL + ri)fn,Lni{t,x,y) x{i). 
Lc[i;»ii s. t. ieinL 

\lnL\>2,\lnL\>i 

As rj is the probability that the offspring does not inherit all the genes at loci i G I 
from the same parent, fj > r{n). Therefore the differential equation {Sn,i) has a unique 
solution: 

vn.i{t, X, y) = y{I)e-''' + e'''' f g{s, x, y)e^''-<''^> ds. 

Jo 

By our assumptions on the functions fn.j for J I, g is a bounded continuous function 
on IR_|„ x[0, 1]" X [—1, i]2"-n-i gyg}^ that the value of g{t,x,y) depends on x and y only 
through the coordinates x{i) for i £ I and y{L) for L C I such that \L\ > 2. Therefore, 
the function fnj{t,x,y) = e'^^"'^^Vn,i{t,x,y) has the asserted properties. 
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Conditions (c) and (e). Condition (c) is easy to verify using formulae (7.3), (7.4), (7.5) 
describing the moments of z'^^ — z. This leads to: 

N'^W.^\(xf\i) - x{i)f\ = 0(iV"2) Vi G p ;n], uniformly on z G ^^v- 

Similarly, using Lemma 7.6, we obtain 

iVIE^[(y/^V) - y{^)f \ = 0(JSS~^) yi C |1 s.t. \I\ > 2, uniformly on 2; G En- 

□ 

7.2 Expressions for the drift 

We have shown that the i-th coordinate of the drift of the limiting diffusion is 

(1 - x{i))ni - x(i)/io + (1/2 - x(i))x(i)(l - x{i))Pi4x) 

where 

PiA^)= {sjU{i},H - SJ,h)x 

JCllM\{i} Hcll;nj\{i} 

iixij)iixih) n (i-^(j)) n 

jeJ h<^H 3e[i;nl, he[i;nl, 

j<^JU{i} h^HU{i} 

and, for two subsets / and J of [1 ; nj, sjj denotes the assortment parameter Sij for the types 
i = (0/, 1/) and j = (Oj, Ij). The following lemma states that Pi^s{x) is actually a polynomial 
function in the variables x{i){l — x{i)) for i G Jl ; nJ \ {u}: 

Lemma 7.9. Let A be a finite subset of IN. Consider a family of reals /3 = {Pi^j, /, J C A} 
such that f3j^j = for every /, J C A. Then, 

JCAHCA \ jeJ h£H jeA\J heA\H J 

= E^^(/5)n^w(i-^w) (7-17) 

LcA ^ei 

where 

TcL Act 
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Proof. Let Ph{P) denote the polynomial function on the right-hand side. The proof is by 
induction on |A|. First, P0(/3)(x) = /300 = C0(/3). 

Let n G IN. Assume that the equality (7.17) holds for every subset A of IN with at most n 

elements and every family of reals (3 satisfying the assumptions of the lemma. 

Let A be a subset of IN with n+l elements, let j be an element of A and let r/ = {?7/,j, /, J C A} 

be a family of reals such that rji^j = rjj^j j^j for every /, J C A. We split PAiv) ii^to a sum 

over the subsets of A containing j and a sum over the subsets of A\ {j} to obtain the following 

expression: 

KcA\{j} LcA\{j} k€K e&L keA\K heA\L 

(x{jfr]Ku{j},Lu{j} + (1 - x{j)fr]K,L + x{j){l - x{j)){riKvj{j},L + VK,LU{j})) ■ 

This expression can be simplified by using that ^yxuO j.LuO} = Vk,l- 

PA(r?)(x)=PA\{,}(r?W)(x) 

+ x(j)(l - x(j))(PA\{,}(r?W)(x) + PA\|,.}(r?(2))(:,) _ 2PA\|,.}(r?(0))(x)) , 

where r/^*^) , rj^^'^ and ry^^^ are the following three families of reals indexed by the pairs of subsets 
of A\{i}: 

Va^b = VA,B, rj^A^B = ^Au{j},B and r/^^^^^ = r]A,Bu{j} for every A,B C A\ {j}. 
The inductive hypothesis applies to A \ {j} and the three families of reals r]^^\ ij^^^ and r/^^^: 

PA{rj){x)= E cuv)ll'^mi-xm+ E CLUxm-xm 

LcA\{j} l&L LcA, j£L e&L 

where 

Cl= E (-2)1^1-^-1^1 E iVAu{j},T\A + VA,(TU{j})\A - '^VA,T\a)- 
TcL\{j} ACT 

The double sum of the terms ??Au{j},T\A + 'M,(Tu{j})\A is equal to: 

E (-2)1^1-1^1 E ^A,nA. 

TcL, jeT Act 

Therefore, Cl = Ciif]) and Pp^{rj){x) = YlLcA^L{'n)WgcL^i^)i^ ~ x{^)) which completes the 
proof by induction. □ 
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By Lemma 7.9, the expanded form of Pi^s as a polynomial function of the n — 1 variables 

n,s{x)= ai,L{s)Ylx{e){i - x{e)) (7.18) 

Lc[l;n]\{i} ieL 

where 

ai,L(s) = ^(-2)l^l"l'^l Y{sAu{i},T\A - sa,t\a)- 
TcL Act 

The coefficient at,L(s) can be rewritten in terms of the mean values of the assortment param- 
eters mr(s) for TcL: 

a,,L{s) = 2l^l E(-l)"^'^"^'("^^u«(^) - mT{s)) = 2l^l J] (-l)I^H^I<^,[m(.)](T). 

TcL TCL 

Indeed, it follows from the assumption H4 that for every i S [1 ; n] and T C [1 ; n] \ {«}, 

mris) = 2~l'^l ^ SA,T\yl and mTvj{i}{s) = 2"l^l ^ s^u{i},r\A- 
Act Act 

Using formula (4.5), we obtain aj^L(s) = 2l^l5^u|j|[m(s)](0). 

The following factorised form of the polynomial function Pj^g can be derived from a general 

identity stated in Lemma A.l: 

Pi,s{x)= 5^[m{s)]{A)\{2x{k){l-x{k)) W {l-2x{^){l-x{^))). 

Ac[l;n]\{i} kcA £0AU{i} 

A Appendix 

A.l Combinatorial formulae for difference operators 

This section collects some combinatorial formulae used to study the limiting diffusion. Let 
be a finite set and i be a real. For a function / defined on V{E), we set 

St{f){A) = Y for every A G V{E) 

BcA 

(with the usual convention = 1 for every a G IR). Most of the combinatorial formulae used 
in the paper can be deduced from this general identity: 

Lemma A.l. Let U be a subset of E and let {xu,u C U} be a family of reals. 

Y st{f)iA) n = E /(^) n n (1 + *^^)- (^-i) 

AcU i€A BCU ieB jeU\B 
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Proof. One way to derive this equality is to interchange the sum on the right-hand side of the 
equation with the sum that appears in the definition of St{f){A), to use the new summation 
index C = A \ B and to recognize the following expansion of the product of the terms 1 + ixf. 

ieu\B ccu\B iec 

□ 

As S-i{f){A) is nothing other than 5^[/](0) by (4.5), if we apply Lemma A.l with t = — 1, 
f{A) = 6i[m{s)]{A) and the family of reals {2x{j){l — x{j)), j £ Jl ; nj \ {i}}, we obtain the 
following equality 

2\^\SAu{^}[m{sW)llxm-x{e)) 

Acll]nl\{i} ^eA 

= Y Sdmis)]iA)l[2xik)il-xik)) n {l-2xii)il-x{i))). 

Acll;n}\{i} k&A l<^A\j{i} 

This shows the equality between the expanded form (4.3) and factorised form (4.1) of the 
polynomial term Pi^s{x) appearing in the drift of the limiting diffusion. 

By taking Xj = — 1/t for every i G [/ in Lemma A.l, we can deduce the inverse of the 
operator St- This gives a useful formula for inverting a relation between two sequences indexed 
by the subsets of a finite set. 

Corollary A.l. The inverse of the operator St is S^t, that is 

fiA) = Y (-t)l^l-l^l5t(/)(B) for every AcE. 
BcA 

Prom Corollary A.l we can deduce the following identity for the finite difference operator: 

f{A) = Y ^Bifm for every A G ViE). (A.2) 

BcA 

By considering the operator St for a function / which is constant on subsets having the same 
number of elements, we can rewrite the previous relations to obtain useful formulae relating 
two sequences indexed by the integers 0, 1, . . . , n. 

Corollary A.2. Let t be a real number. Let n G IN*. For a function f defined on [0;n], let 
St{f) be the function defined by: 

st{f){k) = Y (%'''fi^) for every k G [l;nl. 

Then, 
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1. For every x E IR" 

n n 
3=0 i=0 iC[l;n] s. t. \L\=l i<^L jG[l;n]\L 

where Cnj denotes the elementary polynomial of degree j in n variables: 

JC[[l;n] s. t. \J\=j i&J 

2. The operator s-t is the inverse of the operator st: 



/(^) = E f^l (-i)'"'5t(/)W for every k G [1 jn]. 



This corollary provides identities for the forward finite difference operators of any orders 
since = 5^^\f]{Q) for every A; € [[0;n]. In particular, this leads to the following 

formula used in the proof of Proposition 5.1: 

E = /(^) for e^ery k G Il;nl (A.3) 

and Lemma 6.2 used in the proof of Proposition 6.3. 
A.2 Example 6.2 

Under the hypotheses of the assertion l-(b) of Proposition 6.3, the logarithm of the sta- 
tionary density /in,s,^ takes its maximum value in [0, 1/2]" at a unique point (^O; • • • )Co) such 
that Ao = '^0(1 ~ Co) is the unique solution in ]0, l/4[ of the equation £'q: 



2^-1 + 1^' 2H^^+^^ [m] (0) = 0. 

In [0,1/2]" the saddle points of index n — 1 has one coordinate equal to 1/2 and (n — 1) 
coordinates equal to d where Ai = Ci(l — is the unique solution in ]0, l/4[ of the equation 

fc=o ^ ^ 
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If we denote by hn,i the value of /in,s,/^ at a critical point of index n — i then 

n-l ^ ^ 

Vo - K,n =(2a^ + l)n ln(4Ao) + ^ 2" 6^''+'^ [m] (0) ( ^ J (A^^^ - (1/4)'=+^) 



hn,o-hn,i =(2^ + l)(nln(^) + ln(4Ai) 



+ g 2^5^^^ [m] (0) ( ; J) (A^-^^ - I{.<n-2) + (Ao^^^ - ^ A? )) . 

If we define the assortment by means of the Hamming criterion with the quadratic sequence of 
parameters: Sk = sq — {bk + ck^) V/c € |0 ; n] with c > and 5 + c > 0, then 

5(1) [m] (A;) = -(& + c + 2A;c) VA: G [0;n- 1], 5'^^^[m\{0) = -2c and (5('')[m](0) = Vr > 3. 

In this case, Aq and Ai are solutions of quadratic functions: 2/i— 1— (6+c)Ao— 4c(n— l)Ag = and 
2^ — 1 — (6+2c)Ai — 4c(n — 2)Af =0. After some computations, we obtain: /in,o~^n,n 



c^2 



and hno — hnA ~ n^/'^l/2Wc{2fi — 1). 

' ' n— >-+oo 

A. 3 Property of a symmetric matrix 

The following lemma is used to determine the nature of the critical points of the density of 
the invariant measure (Proposition 6.3). 

Lemma A. 2. For a real a and two integers k and n so that n > 1 and < k < n, let M^^kicL) 
denote the following symmetric matrix: 



M, 



n.k 



n—k,k -^n—k 



where 



Ak denotes the following k-by-k matrix: 



(\ a ■ ■ ■ a\ 

a 



■ ■ a 

a 1/ 



• -Bfcj^fcj denotes the ki-by-k2 matrix all the elements of which are equal to —a. 



If < a < 1 then Mn^k{o) is positive definite. 
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Proof. Let Qn,k,a denote the quadratic form with matrix M„ ^(a) in the canonical basis. For 
every x G IR'', Qn,k,a{x) = Y17=i^i + '^"■Y^i<i<j<n^i^j^i^j^ where ei = . . . = = 1 and 
€k+i = . . . = = — 1. This lemma can be established by induction on n by using the following 
decomposition of Qn,k,a{x)'- 

n—l n—l 
Qn,k,a{^) = i^n + aCn'^eiXif + {1 - a^)(^'^xj + 2b ^ eiCjXiXjy 
i=l i=l l<*<i<"~l 

where 5= e [0, 1[. □ 
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